ÆÈ·Î¿ì
Daniel Russo
Daniel Russo
gsb.columbia.eduÀÇ À̸ÞÀÏ È®ÀÎµÊ - ȨÆäÀÌÁö
Á¦¸ñ
Àοë
Àοë
¿¬µµ
A tutorial on thompson sampling
D Russo, B Van Roy, A Kazerouni, I Osband, Z Wen
Foundations and Trends in Machine Learning 11 (1), 1–96, 2018
9502018
Learning to optimize via posterior sampling
D Russo, B Van Roy
Mathematics of Operations Research 39 (4), 1221-1243, 2014
6832014
An information-theoretic analysis of thompson sampling
D Russo, B Van Roy
The Journal of Machine Learning Research 17 (1), 2442-2471, 2016
3902016
A finite time analysis of temporal difference learning with linear function approximation
J Bhandari, D Russo, R Singal
Operations Research 69 (3), 950--973, 2021
3232021
How much does your data exploration overfit? Controlling bias via information usage.
D Russo, J Zou
IEEE Transactions on Information Theory, 2019
316*2019
Learning to optimize via information-directed sampling
D Russo, B Van Roy
Operations Research 66 (1), 230-252, 2018
299*2018
Deep Exploration via Randomized Value Functions.
I Osband, B Van Roy, DJ Russo, Z Wen
J. Mach. Learn. Res. 20 (124), 1-62, 2019
2872019
Simple Bayesian Algorithms for Best-Arm Identification
D Russo
Operations Research 68 (6), 1625--1647, 2020
256*2020
Eluder Dimension and the Sample Complexity of Optimistic Exploration.
D Russo, B Van Roy
Advances in Neural Information Processing Systems 26, 2256-2264, 2013
2122013
Global optimality guarantees for policy gradient methods
J Bhandari, D Russo
arXiv preprint arXiv:1906.01786, 2019
2082019
Improving the expected improvement algorithm
C Qin, D Klabjan, D Russo
Advances in Neural Information Processing Systems, 5382--5392, 2017
1312017
(More) efficient reinforcement learning via posterior sampling
I Osband, D Russo, B Van Roy
Advances in Neural Information Processing Systems 26, 2013
872013
Worst-case regret bounds for exploration via randomized value functions
D Russo
Advances in Neural Information Processing Systems 32, 2019
832019
On the linear convergence of policy gradient methods for finite mdps
J Bhandari, D Russo
International Conference on Artificial Intelligence and Statistics, 2386-2394, 2021
75*2021
Satisficing in time-sensitive bandit learning
D Russo, B Van Roy
Mathematics of Operations Research 47 (4), 2815-2839, 2022
57*2022
Adaptivity and confounding in multi-armed bandit experiments
C Qin, D Russo
arXiv preprint arXiv:2202.09036, 2022
262022
A note on the equivalence of upper confidence bounds and gittins indices for patient agents
D Russo
Operations Research 69 (1), 273-278, 2021
142021
Policy gradient optimization of Thompson sampling policies
S Min, CC Moallemi, DJ Russo
arXiv preprint arXiv:2006.16507, 2020
102020
On the futility of dynamics in robust mechanism design
SR Balseiro, A Kim, D Russo
Operations Research 69 (6), 1767-1783, 2021
92021
Approximation benefits of policy gradient methods with aggregated states
D Russo
Management Science 69 (11), 6898-6911, 2023
62023
ÇöÀç ½Ã½ºÅÛÀÌ ÀÛµ¿µÇÁö ¾Ê½À´Ï´Ù. ³ªÁß¿¡ ´Ù½Ã ½ÃµµÇØ ÁÖ¼¼¿ä.
ÇмúÀÚ·á 1–20