Follow
Shijie Cao
Shijie Cao
Microsoft Research Asia
Verified email at microsoft.com - Homepage
Title
Cited by
Cited by
Year
Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity
S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan, Y Liu, M Wu, L Zhang
Proceedings of the 2019 ACM/SIGDA International Symposium on Field …, 2019
1892019
Balanced sparsity for efficient dnn inference on gpu
Z Yao, S Cao, W Xiao, C Zhang, L Nie
Proceedings of the AAAI conference on artificial intelligence 33 (01), 5676-5683, 2019
1182019
Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization
S Cao, L Ma, W Xiao, C Zhang, Y Liu, L Zhang, L Nie, Z Yang
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019
802019
Dense-to-sparse gate for mixture-of-experts
X Nie, S Cao, X Miao, L Ma, J Xue, Y Miao, Z Yang, Z Yang, CUI Bin
222021
Evomoe: An evolutional mixture-of-experts training framework via dense-to-sparse gate
X Nie, X Miao, S Cao, L Ma, Q Liu, J Xue, Y Miao, Y Liu, Z Yang, B Cui
arXiv preprint arXiv:2112.14397, 2021
142021
Integer or floating point? new outlooks for low-bit quantization on large language models
Y Zhang, L Zhao, S Cao, W Wang, T Cao, F Yang, M Yang, S Zhang, N Xu
arXiv preprint arXiv:2305.12356, 2023
92023
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
R Hwang, J Wei, S Cao, C Hwang, X Tang, T Cao, M Yang, M Rhu
arXiv preprint arXiv:2308.12066, 2023
32023
Efficient gpu kernels for n: m-sparse weights in deep learning
B Lin, N Zheng, L Wang, S Cao, L Ma, Q Zhang, Y Zhu, T Cao, J Xue, ...
Proceedings of Machine Learning and Systems 5, 2023
22023
AFPQ: Asymmetric Floating Point Quantization for LLMs
Y Zhang, S Zhang, S Cao, D Du, J Wei, T Cao, N Xu
arXiv preprint arXiv:2311.01792, 2023
12023
NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors
J Wei, T Cao, S Cao, S Jiang, S Fu, M Yang, Y Zhang, Y Liu
Proceedings of the 21st Annual International Conference on Mobile Systems …, 2023
12023
Adam accumulation to reduce memory footprints of both activations and gradients for large-scale dnn training
Y Zhang, Y Han, S Cao, G Dai, Y Miao, T Cao, F Yang, N Xu
arXiv preprint arXiv:2305.19982, 2023
12023
Accurate and structured pruning for efficient automatic speech recognition
H Jiang, LL Zhang, Y Li, Y Wu, S Cao, T Cao, Y Yang, J Li, M Yang, L Qiu
arXiv preprint arXiv:2305.19549, 2023
12023
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
D Du, Y Zhang, S Cao, J Guo, T Cao, X Chu, N Xu
arXiv preprint arXiv:2402.10631, 2024
2024
FlexSaaS: A Reconfigurable Accelerator for Web Search Selection
S Cao, L Nie, D Zhan, W Wang, N Xu, R Das, M Wu, L Zhang, D Chiou
ACM Transactions on Reconfigurable Technology and Systems (TRETS) 12 (1), 1-20, 2019
2019
The Case for Learning Machine Language
G Liu, CJM Liang, S Cao, S Lu, L van Doorn
The system can't perform the operation now. Try again later.
Articles 1–15