Fine-grained video-text retrieval with hierarchical graph reasoning S Chen, Y Zhao, Q Jin, Q Wu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 213 | 2020 |
Speech emotion recognition with acoustic and lexical features Q Jin, C Li, S Chen, H Wu 2015 IEEE international conference on acoustics, speech and signal …, 2015 | 178 | 2015 |
Say as you wish: Fine-grained control of image caption generation with abstract scene graphs S Chen, Q Jin, P Wang, Q Wu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 161 | 2020 |
Multimodal multi-task learning for dimensional and continuous emotion recognition S Chen, Q Jin, J Zhao, S Wang Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, 19-26, 2017 | 144 | 2017 |
Multi-modal dimensional emotion recognition using recurrent neural networks S Chen, Q Jin Proceedings of the 5th International Workshop on Audio/Visual Emotion …, 2015 | 124 | 2015 |
Describing videos using multi-modal fusion Q Jin, J Chen, S Chen, Y Xiong, A Hauptmann Proceedings of the 24th ACM international conference on Multimedia, 1087-1091, 2016 | 98 | 2016 |
WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021 | 81 | 2021 |
History aware multimodal transformer for vision-and-language navigation S Chen, PL Guhur, C Schmid, I Laptev Advances in neural information processing systems 34, 5834-5847, 2021 | 73 | 2021 |
Video captioning with guidance of multimodal latent topics S Chen, J Chen, Q Jin, A Hauptmann Proceedings of the 25th ACM international conference on Multimedia, 1838-1846, 2017 | 67 | 2017 |
Multi-modal conditional attention fusion for dimensional emotion prediction S Chen, Q Jin Proceedings of the 24th ACM international conference on Multimedia, 571-575, 2016 | 61 | 2016 |
Airbert: In-domain pretraining for vision-and-language navigation PL Guhur, M Tapaswi, S Chen, I Laptev, C Schmid Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 52 | 2021 |
Multi-modal multi-cultural dimensional continues emotion recognition in dyadic interactions J Zhao, R Li, S Chen, Q Jin Proceedings of the 2018 on audio/visual emotion challenge and workshop, 65-72, 2018 | 46 | 2018 |
Elaborative rehearsal for zero-shot action recognition S Chen, D Huang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 38 | 2021 |
Unpaired cross-lingual image caption generation with self-supervised rewards Y Song, S Chen, Y Zhao, Q Jin Proceedings of the 27th ACM International Conference on Multimedia, 784-792, 2019 | 38 | 2019 |
Sketch, ground, and refine: Top-down dense video captioning C Deng, S Chen, D Chen, Y He, Q Wu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 32 | 2021 |
Think global, act local: Dual-scale graph transformer for vision-and-language navigation S Chen, PL Guhur, M Tapaswi, C Schmid, I Laptev Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 30 | 2022 |
Generating Video Descriptions With Latent Topic Guidance S Chen, Q Jin, J Chen, A Hauptmann IEEE TRANSACTIONS ON MULTIMEDIA 21 (9), 2407-2418, 2019 | 28 | 2019 |
Adversarial domain adaption for multi-cultural dimensional emotion recognition in dyadic interactions J Zhao, R Li, J Liang, S Chen, Q Jin Proceedings of the 9th International on Audio/Visual Emotion Challenge and …, 2019 | 22 | 2019 |
Neural storyboard artist: Visualizing stories with coherent image sequences S Chen, B Liu, J Fu, R Song, Q Jin, P Lin, X Qi, C Wang, J Zhou Proceedings of the 27th ACM International Conference on Multimedia, 2236-2244, 2019 | 22 | 2019 |
Generating video descriptions with topic guidance S Chen, J Chen, Q Jin Proceedings of the 2017 ACM on International Conference on Multimedia …, 2017 | 21 | 2017 |