Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models G Park, B Park, M Kim, S Lee, J Kim, B Kwon, SJ Kwon, B Kim, Y Lee, ... arXiv preprint arXiv:2206.09557, 2022 | 132 | 2022 |
Structured compression by weight encryption for unstructured pruning and quantization SJ Kwon, D Lee, B Kim, P Kapoor, B Park, GY Wei Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern ¡¦, 2020 | 53 | 2020 |
No token left behind: Reliable kv cache compression via importance-aware mixed precision quantization JY Yang, B Kim, J Bae, B Kwon, G Park, E Yang, SJ Kwon, D Lee arXiv preprint arXiv:2402.18096, 2024 | 37 | 2024 |
Alphatuning: Quantization-aware parameter-efficient adaptation of large-scale pre-trained language models SJ Kwon, J Kim, J Bae, KM Yoo, JH Kim, B Park, B Kim, JW Ha, N Sung, ... arXiv preprint arXiv:2210.03858, 2022 | 37 | 2022 |
Biqgemm: matrix multiplication with lookup table for binary-coding-based quantized dnns Y Jeon, B Park, SJ Kwon, B Kim, J Yun, D Lee SC20: International Conference for High Performance Computing, Networking ¡¦, 2020 | 37 | 2020 |
Extremely low bit transformer quantization for on-device neural machine translation I Chung, B Kim, Y Choi, SJ Kwon, Y Jeon, B Park, S Kim, D Lee arXiv preprint arXiv:2009.07453, 2020 | 32 | 2020 |
Learning low-rank approximation for cnns D Lee, SJ Kwon, B Kim, GY Wei arXiv preprint arXiv:1905.10145, 2019 | 24 | 2019 |
Deeptwist: Learning model compression via occasional weight distortion D Lee, P Kapoor, B Kim arXiv preprint arXiv:1810.12823, 2018 | 24 | 2018 |
Rethinking channel dimensions to isolate outliers for low-bit weight quantization of large language models JH Heo, J Kim, B Kwon, B Kim, SJ Kwon, D Lee arXiv preprint arXiv:2309.15531, 2023 | 16 | 2023 |
Flexor: Trainable fractional quantization D Lee, SJ Kwon, B Kim, Y Jeon, B Park, J Yun Advances in neural information processing systems 33, 1311-1321, 2020 | 15 | 2020 |
Retraining-based iterative weight quantization for deep neural networks D Lee, B Kim arXiv preprint arXiv:1805.11233, 2018 | 13 | 2018 |
Hyperclova x technical report KM Yoo, J Han, S In, H Jeon, J Jeong, J Kang, H Kim, KM Kim, M Kim, ... arXiv preprint arXiv:2404.01954, 2024 | 6 | 2024 |
Network pruning for low-rank binary indexing D Lee, SJ Kwon, B Kim, P Kapoor, GY Wei arXiv preprint arXiv:1905.05686, 2019 | 6 | 2019 |
Winning both the accuracy of floating point activation and the simplicity of integer arithmetic Y Kim, J Jang, J Lee, J Park, J Kim, B Kim, SJ Kwon, D Lee The Eleventh International Conference on Learning Representations, 2023 | 5 | 2023 |
Computation-efficient quantization method for deep neural networks P Kapoor, D Lee, B Kim, S Lee | 5 | 2018 |
DropBP: accelerating fine-tuning of large language models by dropping backward propagation S Woo, B Park, B Kim, M Jo, SJ Kwon, D Jeon, D Lee arXiv preprint arXiv:2402.17812, 2024 | 4 | 2024 |
Encoding weights of irregular sparsity for fixed-to-fixed model compression B Park, SJ Kwon, D Oh, B Kim, D Lee arXiv preprint arXiv:2105.01869, 2021 | 4 | 2021 |
To fp8 and back again: Quantifying the effects of reducing precision on llm training stability J Lee, J Bae, B Kim, SJ Kwon, D Lee arXiv preprint arXiv:2405.18710, 2024 | 3 | 2024 |
Post-training weighted quantization of neural networks for language models SJ Kwon, D Lee, Y Jeon, B Kim, BS Park, Y Ro | 3 | 2021 |
Q-Rater: Non-convex optimization for post-training uniform quantization B Kim, D Lee, Y Ro, Y Jeon, SJ Kwon, B Park, D Oh arXiv preprint arXiv:2105.01868, 2021 | 2 | 2021 |