The pile: An 800gb dataset of diverse text for language modeling L Gao, S Biderman, S Black, L Golding, T Hoppe, C Foster, J Phang, H He, ... arXiv preprint arXiv:2101.00027, 2020 | 1141* | 2020 |
Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow S Black, L Gao, P Wang, C Leahy, S Biderman If you use this software, please cite it using these metadata 58, 2, 2021 | 763* | 2021 |
Gpt-neox-20b: An open-source autoregressive language model S Black, S Biderman, E Hallahan, Q Anthony, L Gao, L Golding, H He, ... arXiv preprint arXiv:2204.06745, 2022 | 550 | 2022 |
A framework for few-shot language model evaluation L Gao, J Tow, S Biderman, S Black, A DiPofi, C Foster, L Golding, J Hsu, ... Version v0. 0.1. Sept, 8, 2021 | 364* | 2021 |
MAGMA--Multimodal Augmentation of Generative Models through Adapter-based Finetuning C Eichenberg, S Black, S Weinbach, L Parcalabescu, A Frank arXiv preprint arXiv:2112.05253, 2021 | 86 | 2021 |
GPT-NeoX: Large scale autoregressive language modeling in pytorch A Andonian, Q Anthony, S Biderman, S Black, P Gali, L Gao, E Hallahan, ... GitHub Repo, 1877-1901, 2021 | 53* | 2021 |
Interpreting neural networks through the polytope lens S Black, L Sharkey, L Grinsztajn, E Winsor, D Braun, J Merizian, K Parker, ... arXiv preprint arXiv:2211.12312, 2022 | 11 | 2022 |
The singular value decompositions of transformer weight matrices are highly interpretable B Millidge, S Black AI Alignment Forum, 2022 | 11* | 2022 |
Rotary embeddings: A relative revolution S Biderman, S Black, C Foster, L Gao, E Hallahan, H He, B Wang, ... Charles Foster, Leo Gao, Eric Hallahan, Horace He, et al., Rotary embeddings …, 2021 | 8* | 2021 |
Conjecture: Internal infohazard policy C Leahy, S Black, C Scammell, A Miotti AI Alignment Forum, 2022 | 2 | 2022 |