팔로우
Philip Thomas
제목
인용
인용
연도
Data-efficient off-policy policy evaluation for reinforcement learning
P Thomas, E Brunskill
International Conference on Machine Learning, 2139-2148, 2016
3972016
Value function approximation in reinforcement learning using the Fourier basis
G Konidaris, S Osentoski, P Thomas
Twenty-fifth AAAI conference on artificial intelligence, 2011
3712011
High-confidence off-policy evaluation
P Thomas, G Theocharous, M Ghavamzadeh
Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015
2332015
High confidence policy improvement
P Thomas, G Theocharous, M Ghavamzadeh
International Conference on Machine Learning, 2380-2388, 2015
1712015
Personalized ad recommendation systems for life-time value optimization with guarantees
G Theocharous, PS Thomas, M Ghavamzadeh
Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015
1422015
Increasing the action gap: New operators for reinforcement learning
MG Bellemare, G Ostrovski, A Guez, P Thomas, R Munos
Proceedings of the AAAI Conference on Artificial Intelligence 30 (1), 2016
1302016
Bias in natural actor-critic algorithms
P Thomas
International conference on machine learning, 441-448, 2014
1282014
Learning action representations for reinforcement learning
Y Chandak, G Theocharous, J Kostas, S Jordan, P Thomas
International conference on machine learning, 941-950, 2019
1122019
Preventing undesirable behavior of intelligent machines
P Thomas, B Castro da Silva, A Barto, S Giguere, Y Brun, E Brunskill
Science 366 (6468), 999-1004, 2019
1002019
Safe reinforcement learning
PS Thomas
862015
Proximal reinforcement learning: A new theory of sequential decision making in primal-dual spaces
S Mahadevan, B Liu, P Thomas, W Dabney, S Giguere, N Jacek, I Gemp, ...
arXiv preprint arXiv:1405.6757, 2014
472014
Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards
KM Jagodnik, PS Thomas, AJ van den Bogert, MS Branicky, RF Kirsch
IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (10 …, 2017
452017
Use of atrial and bifocal cardiac pacemakers for treating resistant dysrhythmias.
LS Dreifus, BV Berkovits, D Kimibiris, K Moghadam, G Haupt, P Walinsky, ...
European Journal of Cardiology 3 (4), 257-266, 1975
451975
Using options and covariance testing for long horizon off-policy policy evaluation
Z Guo, PS Thomas, E Brunskill
Advances in Neural Information Processing Systems 30, 2017
402017
Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing
PS Thomas, G Theocharous, M Ghavamzadeh, I Durugkar, E Brunskill
Twenty-Ninth IAAI Conference, 2017
382017
Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines
PS Thomas, E Brunskill
arXiv preprint arXiv:1706.06643, 2017
372017
Projected natural actor-critic
PS Thomas, WC Dabney, S Giguere, S Mahadevan
Advances in neural information processing systems 26, 2013
372013
Some recent applications of reinforcement learning
AG Barto, PS Thomas, RS Sutton
Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017
362017
Application of the actor-critic architecture to functional electrical stimulation control of a human arm
PS Thomas, A van den Bogert, K Jagodnik, M Branicky
Twenty-First IAAI Conference, 2009
362009
Importance Sampling for Fair Policy Selection.
S Doroudi, PS Thomas, E Brunskill
Grantee Submission, 2017
342017
현재 시스템이 작동되지 않습니다. 나중에 다시 시도해 주세요.
학술자료 1–20