Æȷοì
Ziang Song
Ziang Song
stanford.eduÀÇ À̸ÞÀÏ È®ÀεÊ
Á¦¸ñ
Àοë
Àοë
¿¬µµ
When can we learn general-sum Markov games with a large number of players sample-efficiently?
Z Song, S Mei, Y Bai
arXiv preprint arXiv:2110.04184, 2021
862021
Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent
Y Bai, C Jin, S Mei, Z Song, T Yu
Advances in Neural Information Processing Systems 35, 22313-22325, 2022
132022
Reward collapse in aligning large language models
Z Song, T Cai, JD Lee, WJ Su
arXiv preprint arXiv:2305.17608, 2023
122023
Sample-efficient learning of correlated equilibria in extensive-form games
Z Song, S Mei, Y Bai
Advances in Neural Information Processing Systems 35, 4099-4110, 2022
112022
Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings
Z Song, T Cai, JD Lee, WJ Su
ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023
12023
ÇöÀç ½Ã½ºÅÛÀÌ ÀÛµ¿µÇÁö ¾Ê½À´Ï´Ù. ³ªÁß¿¡ ´Ù½Ã ½ÃµµÇØ ÁÖ¼¼¿ä.
ÇмúÀÚ·á 1–5