I-Vector Based Clustering Training Data in Speech Recognition Q Huo, ZJ Yan, Y Zhang, J Xu US Patent App. 13/640,804, 2015 | 237 | 2015 |
Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models Y Chu, J Xu, X Zhou, Q Yang, S Zhang, Z Yan, C Zhou, J Zhou arXiv preprint arXiv:2311.07919, 2023 | 163 | 2023 |
Deep-FSMN for large vocabulary continuous speech recognition S Zhang, M Lei, Z Yan, L Dai 2018 IEEE International Conference on Acoustics, Speech and Signal ¡¦, 2018 | 130 | 2018 |
M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge F Yu, S Zhang, Y Fu, L Xie, S Zheng, Z Du, W Huang, P Guo, Z Yan, B Ma, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and ¡¦, 2022 | 92 | 2022 |
Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition Z Gao, S Zhang, I McLoughlin, Z Yan arXiv preprint arXiv:2206.08317, 2022 | 83 | 2022 |
A unified trajectory tiling approach to high quality speech rendering Y Qian, FK Soong, ZJ Yan IEEE transactions on audio, speech, and language processing 21 (2), 280-290, 2012 | 66 | 2012 |
A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR ZJ Yan, Q Huo, J Xu Proc. Interspeech 2013, 104-108, 2013 | 64 | 2013 |
Improving latency-controlled BLSTM acoustic models for online speech recognition S Xue, Z Yan 2017 IEEE International Conference on Acoustics, Speech and Signal ¡¦, 2017 | 63 | 2017 |
A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognition K Chen, ZJ Yan, Q Huo 2015 13th International Conference on Document Analysis and Recognition ¡¦, 2015 | 44 | 2015 |
Lauragpt: Listen, attend, understand, and regenerate audio with gpt Z Du, J Wang, Q Chen, Y Chu, Z Gao, Z Li, K Hu, X Zhou, J Xu, Z Ma, ... arXiv preprint arXiv:2310.04673, 2023 | 40 | 2023 |
Prosospeech: Enhancing prosody with quantized vector pre-training in text-to-speech Y Ren, M Lei, Z Huang, S Zhang, Q Chen, Z Yan, Z Zhao ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and ¡¦, 2022 | 40 | 2022 |
Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition. S Zhang, M Lei, Z Yan Interspeech, 2180-2184, 2019 | 39 | 2019 |
Rich-context unit selection (RUS) approach to high quality TTS ZJ Yan, Y Qian, FK Soong 2010 IEEE International Conference on Acoustics, Speech and Signal ¡¦, 2010 | 39 | 2010 |
Rich context modeling for high quality HMM-based TTS ZJ Yan, Y Qian, FK Soong Tenth Annual Conference of the International Speech Communication Association, 2009 | 36 | 2009 |
Improved modeling for F0 generation and V/U decision in HMM-based TTS Q Zhang, F Soong, Y Qian, Z Yan, J Pan, Y Yan 2010 IEEE International Conference on Acoustics, Speech and Signal ¡¦, 2010 | 34 | 2010 |
Trajectory Tiling Approach for Text-to-Speech Y Qian, ZJ Yan, YJ Wu, FKP Soong US Patent App. 12/962,543, 2012 | 33 | 2012 |
Method and apparatus for initiating an operation using voice data XU Minqiang, Z Yan, J Gao, M Chu US Patent App. 15/292,632, 2017 | 31 | 2017 |
Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge F Yu, S Zhang, P Guo, Y Fu, Z Du, S Zheng, W Huang, L Xie, ZH Tan, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and ¡¦, 2022 | 30 | 2022 |
Streaming chunk-aware multihead attention for online end-to-end speech recognition S Zhang, Z Gao, H Luo, M Lei, J Gao, Z Yan, L Xie arXiv preprint arXiv:2006.01712, 2020 | 30 | 2020 |
Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens Z Du, Q Chen, S Zhang, K Hu, H Lu, Y Yang, H Hu, S Zheng, Y Gu, Z Ma, ... arXiv preprint arXiv:2407.05407, 2024 | 29 | 2024 |