SciBERT: A Pretrained Language Model for Scientific Text I Beltagy, K Lo, A Cohan EMNLP 2019, 2019 | 4012 | 2019 |
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks S Gururangan, A Marasović, S Swayamdipta, K Lo, I Beltagy, D Downey, ... ACL 2020 (🏆 Honorable Mention for Best Paper 🏆 ), 2020 | 2411 | 2020 |
Bloom: A 176b-parameter open-access multilingual language model TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... arXiv preprint arXiv:2211.05100, 2022 | 1629 | 2022 |
CORD-19: The Covid-19 Open Research Dataset LL Wang, K Lo, Y Chandrasekhar, R Reas, J Yang, D Eide, K Funk, ... arXiv preprint arXiv:2004.10706, 2020 | 1009 | 2020 |
S2orc: The semantic scholar open research corpus K Lo, LL Wang, M Neumann, R Kinney, DS Weld Proceedings of ACL, 2020 | 616 | 2020 |
Construction of the Literature Graph in Semantic Scholar W Ammar, D Groeneveld, C Bhagavatula, I Beltagy, M Crawford, ... arXiv preprint arXiv:1805.02262, 2018 | 508 | 2018 |
Fact or Fiction: Verifying Scientific Claims D Wadden, K Lo, LL Wang, S Lin, M van Zuylen, A Cohan, H Hajishirzi arXiv preprint arXiv:2004.14974, 2020 | 461 | 2020 |
TREC-COVID: constructing a pandemic information retrieval test collection E Voorhees, T Alam, S Bedrick, D Demner-Fushman, WR Hersh, K Lo, ... ACM SIGIR Forum 54 (1), 1-12, 2021 | 241 | 2021 |
TLDR: Extreme Summarization of Scientific Documents I Cachola, K Lo, A Cohan, DS Weld arXiv preprint arXiv:2004.15011, 2020 | 237 | 2020 |
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers P Dasigi, K Lo, I Beltagy, A Cohan, NA Smith, M Gardner arXiv preprint arXiv:2105.03011, 2021 | 215 | 2021 |
The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset H Laurençon, L Saulnier, T Wang, C Akiki, AV del Moral, T Le Scao, ... Thirty-sixth Conference on Neural Information Processing Systems Datasets …, 0 | 170* | |
Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction S Feldman, W Ammar, K Lo, E Trepman, M van Zuylen, O Etzioni JAMA network open 2 (7), e196700-e196700, 2019 | 157 | 2019 |
TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19 K Roberts, T Alam, S Bedrick, D Demner-Fushman, K Lo, I Soboroff, ... Journal of the American Medical Informatics Association, 2020 | 143 | 2020 |
Text mining approaches for dealing with the rapidly expanding literature on COVID-19 LL Wang, K Lo Briefings in Bioinformatics 22 (2), 781-799, 2021 | 140 | 2021 |
Harnessing the power of smart and connected health to tackle COVID-19: IoT, AI, robotics, and blockchain for a better world F Firouzi, B Farahani, M Daneshmand, K Grise, J Song, R Saracco, ... IEEE Internet of Things Journal 8 (16), 12826-12846, 2021 | 127 | 2021 |
Flex: Unifying evaluation for few-shot nlp J Bragg, A Cohan, K Lo, I Beltagy Advances in Neural Information Processing Systems 34, 15787-15800, 2021 | 112 | 2021 |
The Semantic Scholar Open Data Platform R Kinney, C Anastasiades, R Authur, I Beltagy, J Bragg, A Buraczynski, ... arXiv preprint arXiv:2301.10140, 2023 | 96 | 2023 |
Augmenting scientific papers with just-in-time, position-sensitive definitions of terms and symbols A Head, K Lo, D Kang, R Fok, S Skjonsberg, DS Weld, MA Hearst Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems …, 2021 | 96 | 2021 |
Olmo: Accelerating the science of language models D Groeneveld, I Beltagy, P Walsh, A Bhagia, R Kinney, O Tafjord, AH Jha, ... ACL 2024 (🏆 Best Paper Award 🏆 ), 2024 | 87 | 2024 |
Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research L Soldaini, R Kinney, A Bhagia, D Schwenk, D Atkinson, R Authur, ... ACL 2024 (🏆 Best Resource Award 🏆 ), 2024 | 84 | 2024 |