Shizhe Chen

Cited by

	All	Since 2019
Citations	2653	2436
h-index	24	24
i10-index	43	42

840

420

210

630

20162017201820192020202120222023202422 61 124 185 221 343 559 838 286

Public access

View all

41 articles

6 articles

available

not available

Based on funding mandates

Co-authors

Qin Jin中国人民大学信息学院Verified email at ruc.edu.cn
Ivan LaptevVisiting professor at MBZUAI, on leave from INRIAVerified email at inria.fr
Cordelia SchmidResearch director INRIA Verified email at inria.fr
Alex HauptmannCarnegie Mellon UniversityVerified email at cs.cmu.edu
Yuqing SongRenmin University of ChinaVerified email at ruc.edu.cn
Jinming ZhaoRenmin University of ChinaVerified email at ruc.edu.cn
Qi WuAssociate Professor, University of Adelaide, Adelaide, AustraliaVerified email at adelaide.edu.au
Sipeng ZhengBeijing Academy of Artificial Intelligence (BAAI)Verified email at baai.ac.cn
Ruihua SongRenmin University of ChinaVerified email at ruc.edu.cn

Shizhe Chen

INRIA Paris

Verified email at inria.fr - Homepage

Computer Vision Vision-and-Language


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Fine-grained video-text retrieval with hierarchical graph reasoning S Chen, Y Zhao, Q Jin, Q Wu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020	306	2020
Say as you wish: Fine-grained control of image caption generation with abstract scene graphs S Chen, Q Jin, P Wang, Q Wu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020	221	2020
Speech emotion recognition with acoustic and lexical features Q Jin, C Li, S Chen, H Wu 2015 IEEE international conference on acoustics, speech and signal …, 2015	202	2015
Multimodal multi-task learning for dimensional and continuous emotion recognition S Chen, Q Jin, J Zhao, S Wang Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, 19-26, 2017	164	2017
History aware multimodal transformer for vision-and-language navigation S Chen, PL Guhur, C Schmid, I Laptev Advances in neural information processing systems 34, 5834-5847, 2021	154	2021
Multi-modal dimensional emotion recognition using recurrent neural networks S Chen, Q Jin Proceedings of the 5th International Workshop on Audio/Visual Emotion …, 2015	133	2015
WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021	115	2021
Describing videos using multi-modal fusion Q Jin, J Chen, S Chen, Y Xiong, A Hauptmann Proceedings of the 24th ACM international conference on Multimedia, 1087-1091, 2016	115	2016
Airbert: In-domain pretraining for vision-and-language navigation PL Guhur, M Tapaswi, S Chen, I Laptev, C Schmid Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021	104	2021
Think global, act local: Dual-scale graph transformer for vision-and-language navigation S Chen, PL Guhur, M Tapaswi, C Schmid, I Laptev Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	94	2022
Elaborative rehearsal for zero-shot action recognition S Chen, D Huang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021	88	2021
Multi-modal conditional attention fusion for dimensional emotion prediction S Chen, Q Jin Proceedings of the 24th ACM international conference on Multimedia, 571-575, 2016	77	2016
Video captioning with guidance of multimodal latent topics S Chen, J Chen, Q Jin, A Hauptmann Proceedings of the 25th ACM international conference on Multimedia, 1838-1846, 2017	72	2017
Sketch, ground, and refine: Top-down dense video captioning C Deng, S Chen, D Chen, Y He, Q Wu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	60	2021
Instruction-driven history-aware policies for robotic manipulations PL Guhur, S Chen, RG Pinel, M Tapaswi, I Laptev, C Schmid Conference on Robot Learning, 175-187, 2023	57	2023
Multi-modal multi-cultural dimensional continues emotion recognition in dyadic interactions J Zhao, R Li, S Chen, Q Jin Proceedings of the 2018 on audio/visual emotion challenge and workshop, 65-72, 2018	57	2018
Unpaired cross-lingual image caption generation with self-supervised rewards Y Song, S Chen, Y Zhao, Q Jin Proceedings of the 27th ACM international conference on multimedia, 784-792, 2019	41	2019
Generating Video Descriptions With Latent Topic Guidance S Chen, Q Jin, J Chen, A Hauptmann IEEE TRANSACTIONS ON MULTIMEDIA 21 (9), 2407-2418, 2019	39	2019
Towards diverse paragraph captioning for untrimmed videos Y Song, S Chen, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	32	2021
Learning from unlabeled 3d environments for vision-and-language navigation S Chen, PL Guhur, M Tapaswi, C Schmid, I Laptev European Conference on Computer Vision, 638-655, 2022	29	2022

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors