A tutorial on thompson sampling D Russo, B Van Roy, A Kazerouni, I Osband, Z Wen Foundations and Trends in Machine Learning 11 (1), 1–96, 2018 | 950 | 2018 |
Learning to optimize via posterior sampling D Russo, B Van Roy Mathematics of Operations Research 39 (4), 1221-1243, 2014 | 683 | 2014 |
An information-theoretic analysis of thompson sampling D Russo, B Van Roy The Journal of Machine Learning Research 17 (1), 2442-2471, 2016 | 390 | 2016 |
A finite time analysis of temporal difference learning with linear function approximation J Bhandari, D Russo, R Singal Operations Research 69 (3), 950--973, 2021 | 323 | 2021 |
How much does your data exploration overfit? Controlling bias via information usage. D Russo, J Zou IEEE Transactions on Information Theory, 2019 | 316* | 2019 |
Learning to optimize via information-directed sampling D Russo, B Van Roy Operations Research 66 (1), 230-252, 2018 | 299* | 2018 |
Deep Exploration via Randomized Value Functions. I Osband, B Van Roy, DJ Russo, Z Wen J. Mach. Learn. Res. 20 (124), 1-62, 2019 | 287 | 2019 |
Simple Bayesian Algorithms for Best-Arm Identification D Russo Operations Research 68 (6), 1625--1647, 2020 | 256* | 2020 |
Eluder Dimension and the Sample Complexity of Optimistic Exploration. D Russo, B Van Roy Advances in Neural Information Processing Systems 26, 2256-2264, 2013 | 212 | 2013 |
Global optimality guarantees for policy gradient methods J Bhandari, D Russo arXiv preprint arXiv:1906.01786, 2019 | 208 | 2019 |
Improving the expected improvement algorithm C Qin, D Klabjan, D Russo Advances in Neural Information Processing Systems, 5382--5392, 2017 | 131 | 2017 |
(More) efficient reinforcement learning via posterior sampling I Osband, D Russo, B Van Roy Advances in Neural Information Processing Systems 26, 2013 | 87 | 2013 |
Worst-case regret bounds for exploration via randomized value functions D Russo Advances in Neural Information Processing Systems 32, 2019 | 83 | 2019 |
On the linear convergence of policy gradient methods for finite mdps J Bhandari, D Russo International Conference on Artificial Intelligence and Statistics, 2386-2394, 2021 | 75* | 2021 |
Satisficing in time-sensitive bandit learning D Russo, B Van Roy Mathematics of Operations Research 47 (4), 2815-2839, 2022 | 57* | 2022 |
Adaptivity and confounding in multi-armed bandit experiments C Qin, D Russo arXiv preprint arXiv:2202.09036, 2022 | 26 | 2022 |
A note on the equivalence of upper confidence bounds and gittins indices for patient agents D Russo Operations Research 69 (1), 273-278, 2021 | 14 | 2021 |
Policy gradient optimization of Thompson sampling policies S Min, CC Moallemi, DJ Russo arXiv preprint arXiv:2006.16507, 2020 | 10 | 2020 |
On the futility of dynamics in robust mechanism design SR Balseiro, A Kim, D Russo Operations Research 69 (6), 1767-1783, 2021 | 9 | 2021 |
Approximation benefits of policy gradient methods with aggregated states D Russo Management Science 69 (11), 6898-6911, 2023 | 6 | 2023 |