Constitutional ai: Harmlessness from ai feedback Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... arXiv preprint arXiv:2212.08073, 2022 | 593 | 2022 |
Toy models of superposition N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ... arXiv preprint arXiv:2209.10652, 2022 | 146 | 2022 |
Towards measuring the representation of subjective global opinions in language models E Durmus, K Nyugen, TI Liao, N Schiefer, A Askell, A Bakhtin, C Chen, ... arXiv preprint arXiv:2306.16388, 2023 | 62 | 2023 |
Mitigating harm in language models with conditional-likelihood filtration H Ngo, C Raterink, JGM Araújo, I Zhang, C Chen, A Morisot, N Frosst arXiv preprint arXiv:2108.07790, 2021 | 28 | 2021 |
Question decomposition improves the faithfulness of model-generated reasoning A Radhakrishnan, K Nguyen, A Chen, C Chen, C Denison, D Hernandez, ... arXiv preprint arXiv:2307.11768, 2023 | 26 | 2023 |
Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R A Radhakrishnan, K Nguyen, A Chen, C Chen, C Denison, D Hernandez, ... Bowman, and Ethan Perez, 2023 | 11 | 2023 |
Constitutional AI: harmlessness from AI feedback. 2022 Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... ArXiv preprint: https://arxiv. org/pdf/2212.08073. pdf, 0 | 11 | |
Constitutional AI: Harmlessness from AI feedback (arXiv: 2212.08073). arXiv Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... | 8 | 2022 |
Constitutional AI: Harmlessness from AI Feedback, December 2022 Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... URL https://www. anthropic. com/constitutional. pdf, 0 | 7 | |
Predicting twitter engagement with deep language models M Volkovs, Z Cheng, M Ravaut, H Yang, K Shen, JP Zhou, A Wong, ... Proceedings of the Recommender Systems Challenge 2020, 38-43, 2020 | 6 | 2020 |