Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models G Park, B Park, M Kim, S Lee, J Kim, B Kwon, SJ Kwon, B Kim, Y Lee, ... arXiv preprint arXiv:2206.09557, 2022 | 123 | 2022 |
Dfx: A low-latency multi-fpga appliance for accelerating transformer-based text generation S Hong, S Moon, J Kim, S Lee, M Kim, D Lee, JY Kim 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 616-630, 2022 | 66 | 2022 |