Æȷοì
Gunho Park
Gunho Park
NAVER Cloud
navercorp.comÀÇ À̸ÞÀÏ È®ÀεÊ
Á¦¸ñ
Àοë
Àοë
¿¬µµ
Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models
G Park, B Park, M Kim, S Lee, J Kim, B Kwon, SJ Kwon, B Kim, Y Lee, ...
arXiv preprint arXiv:2206.09557, 2023
1322023
Design and analysis of approximate compressors for balanced error accumulation in mac operator
G Park, J Kung, Y Lee
IEEE Transactions on Circuits and Systems I: Regular Papers 68 (7), 2950-2961, 2021
472021
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
JY Yang, B Kim, J Bae, B Kwon, G Park, E Yang, SJ Kwon, D Lee
arXiv preprint arXiv:2402.18096, 2024
372024
Simplified Compressor and Encoder Designs for Low-Cost Approximate Radix-4 Booth Multiplier
G Park, J Kung, Y Lee
IEEE Transactions on Circuits and Systems II: Express Briefs 70 (3), 1154-1158, 2022
192022
TF-MVP: Novel Sparsity-Aware Transformer Accelerator with Mixed-Length Vector Pruning
E Yoo, G Park, JG Min, SJ Kwon, B Park, D Lee, Y Lee
2023 60th ACM/IEEE Design Automation Conference (DAC), 1-6, 2023
62023
Energy-Efficient RISC-V-Based Vector Processor for Cache-Aware Structurally-Pruned Transformers
JG Min, D Kam, Y Byun, G Park, Y Lee
2023 IEEE/ACM International Symposium on Low Power Electronics and Design ¡¦, 2023
52023
Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models
Y Byun, S Moon, B Park, SJ Kwon, D Lee, G Park, E Yoo, JG Min, Y Lee
Proceedings of Machine Learning and Systems 5, 768-779, 2023
32023
Low-Power Encoder and Compressor Design for Approximate Radix-8 Booth Multiplier
J Kim, G Park, Y Lee
2024 IEEE International Symposium on Circuits and Systems (ISCAS), 1-5, 2024
12024
Faster Inference of LLMs using FP8 on the Intel Gaudi
J Lee, S Markovich-Golan, D Ohayon, Y Hanani, G Park, B Kim, A Karnieli, ...
arXiv preprint arXiv:2503.09975, 2025
2025
FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables
G Park, H Kwon, J Kim, J Bae, B Park, D Lee, Y Lee
arXiv preprint arXiv:2503.06862, 2025
2025
An Investigation of FP8 Across Accelerators for LLM Inference
J Kim, J Lee, G Park, B Kim, SJ Kwon, D Lee, Y Lee
arXiv preprint arXiv:2502.01070, 2025
2025
nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
G Park, B Park, SJ Kwon, B Kim, Y Lee, D Lee
arXiv preprint arXiv:2206.09557, 2022
2022
Don¡¯t Discard, but Keep It Small: Context-Preserving KV Cache Compression with Importance-Aware Adaptive Precision
JY Yang, B Kim, J Bae, G Park, B Kwon, E Yang, SJ Kwon, D Lee
ÇöÀç ½Ã½ºÅÛÀÌ ÀÛµ¿µÇÁö ¾Ê½À´Ï´Ù. ³ªÁß¿¡ ´Ù½Ã ½ÃµµÇØ ÁÖ¼¼¿ä.
ÇмúÀÚ·á 1–13