Follow
Samuel L. Smith
Title
Cited by
Cited by
Year
Don't Decay the Learning Rate, Increase the Batch Size
SL Smith, PJ Kindermans, C Ying, QV Le
International Conference on Learning Representations, 2018
13112018
Ultrafast long-range charge separation in organic semiconductor photovoltaic diodes
S Gélinas, A Rao, A Kumar, SL Smith, AW Chin, J Clark, TS van der Poll, ...
Science 343 (6170), 512-516, 2014
10412014
Gemma: Open Models Based on Gemini Research and Technology
G Team, T Mesnard, C Hardin, R Dadashi, S Bhupatiraju, S Pathak, ...
arXiv preprint arXiv:2403.08295, 2024
7112024
Offline bilingual word vectors, orthogonal transformations and the inverted softmax
SL Smith, DHP Turban, S Hamblin, NY Hammerla
International Conference on Learning Representations, 2017
6262017
High-performance large-scale image recognition without normalization
A Brock, S De, SL Smith, K Simonyan
International Conference on Machine Learning, 1059-1071, 2021
6032021
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
SL Smith, QV Le
International Conference on Learning Representations, 2018
4232018
The future of quantum biology
A Marais, B Adams, AK Ringsmuth, M Ferretti, JM Gruber, R Hendrikx, ...
Journal of the Royal Society Interface 15 (148), 20180640, 2018
2412018
Resurrecting recurrent neural networks for long sequences
A Orvieto, SL Smith, A Gu, A Fernando, C Gulcehre, R Pascanu, S De
International Conference on Machine Learning, 26670-26698, 2023
2262023
On the Origin of Implicit Regularization in Stochastic Gradient Descent
SL Smith, B Dherin, DGT Barrett, S De
arXiv preprint arXiv:2101.12176, 2021
2172021
Unlocking high-accuracy differentially private image classification through scale
S De, L Berrada, J Hayes, SL Smith, B Balle
arXiv preprint arXiv:2204.13650, 2022
1892022
Batch normalization biases residual blocks towards the identity function in deep networks
S De, S Smith
Advances in Neural Information Processing Systems 33, 19964-19975, 2020
178*2020
Characterizing signal propagation to close the performance gap in unnormalized ResNets
A Brock, S De, SL Smith
arXiv preprint arXiv:2101.08692, 2021
1392021
On the Generalization Benefit of Noise in Stochastic Gradient Descent
S Smith, E Elsen, S De
International Conference on Machine Learning, 9058-9067, 2020
1152020
BYOL works even without batch statistics
PH Richemond, JB Grill, F Altché, C Tallec, F Strub, A Brock, S Smith, ...
arXiv preprint arXiv:2010.10241, 2020
1082020
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
S De, SL Smith, A Fernando, A Botev, G Cristian-Muraru, A Gu, R Haroun, ...
arXiv preprint arXiv:2402.19427, 2024
662024
Differentially Private Diffusion Models Generate Useful Synthetic Images
S Ghalebikesabi, L Berrada, S Gowal, I Ktena, R Stanforth, J Hayes, S De, ...
arXiv preprint arXiv:2302.13861, 2023
602023
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
DS Park, J Sohl-Dickstein, QV Le, SL Smith
International Conference on Machine Learning, 2019
562019
Phonon-assisted ultrafast charge separation in the PCBM band structure
SL Smith, AW Chin
Physical Review B 91 (20), 201302, 2015
422015
Ultrafast charge separation and nongeminate electron–hole recombination in organic photovoltaics
SL Smith, AW Chin
Physical Chemistry Chemical Physics 16 (38), 20305-20309, 2014
422014
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
B He, J Martens, G Zhang, A Botev, A Brock, SL Smith, YW Teh
arXiv preprint arXiv:2302.10322, 2023
332023
The system can't perform the operation now. Try again later.
Articles 1–20