Xu Yang; 杨旭 | PALM, CS, SEU

About Me

I am now an Associate Professor at PALM lab, Department of Computer Science, Southeast University (SEU), China. I got my B.S. degree from Nanjing University of Posts and Telecommunications (NUPT), M.S. degree from Southeast University (SEU) supervised by Prof Xin Geng, and Ph.D. from Nanyang Technological University (NTU) supervised by Prof Jianfei Cai and Prof Hanwang Zhang.

Research Interests

I have wide interest on AI, especially machine learning and deep learning, recently, I especially focus on multi-model in-context learning and learngene framework. In the past and future few years, I will focus on the following topics:

In-Context Learning
Learngene
Large Language Model
Large Vision-Language Model
Image Captioning

Honorary Title

2024 年全球前 2% 顶尖科学家
江苏省科协青年科技托举人才
江苏省双创博士
紫金青年学者

Competition

CVPR 2024 Long-Form Video Understanding Workshop Track 1: Long-Term Video Question Answering Highest Score Award
ECCV 2024 The Second Perception Test Challenge Workshop HOUR-LONG VIDEO-QA track BSET PERFORMANCE
ECCV 2024 The Second Perception Test Challenge Workshop MULTIPLE-CHOICE VIDEO-QA track BSET PERFORMANCE

Selected Publications

Lever LM: Configuring in-context sequence to lever large vision language models
Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang
Conference and Workshop on Neural Information Processing Systems.NeurIPS 2024.
[Web]

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark
Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yi Lu, Bozheng Li, Weiheng Chi, Zihan Qiu, Lirian Su, Haolin Zheng, Jay Wu, Xu Yang
Association for the Advancement of Artificial Intelligence.AAAI 2025.
[Web]

Number it: Temporal Grounding Videos like Flipping Manga
Yongliang Wu, Xinting Hu, Yuyang Sun, Yizhou Zhou, Wenbo Zhu, Fengyun Rao, Bernt Schiele, Xu Yang
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.CVPR 2025.
[Web]

Devils in middle layers of large vision-language models: Interpreting, detecting and mitigating object hallucinations via attention lens
Zhangqi Jiang, Junkai Chen, Beier Zhu, Tingjin Luo, Yankun Shen, Xu Yang
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.CVPR 2025.
[Web]

Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers
Qiufeng Wang, Xu Yang, Fu Feng, Jing wang, Xin Geng
Conference and Workshop on Neural Information Processing Systems.NeurIPS 2024.
[Web]

Vision Transformers as Probabilistic Expansion from Learngene
Qiufeng Wang, Xu Yang, Haokun Chen, Xin Geng
International Conferenceon Machine Learning.ICML 2024.
[Web]

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation
Shiyu Xia, Yuankun Zu, Xu Yang, Xin Geng
Conference and Workshop on Neural Information Processing Systems.NeurIPS 2024.
[Web]

LIVE: Learnable In-Context Vector for Visual Question Answering
Yingzhe Peng, Chenduo Hao, Xu Yang, Jiawei Peng, Xinting Hu, Xin Geng
Conference and Workshop on Neural Information Processing Systems.NeurIPS 2024.
[Web]

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models
Shuxia Lin, Miaosen Zhang, Ruiming Chen, Xu Yang, Qiufeng Wang, Xin Geng
Conference and Workshop on Neural Information Processing Systems.NeurIPS 2024.
[Web]

Learning to collocate visual-linguistic neural modules for image captioning
Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai
International Journal of Computer Vision.IJCV 2023.
[Web]

How to Configure Good In-Context Sequence for Visual Question Answering
Li Li, Jiawei Peng, Huiyi Chen, Chongyang Gao, Xu Yang
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.CVPR 2024.
[Web]

Exploring Diverse In-Context Configurations for Image Captioning
Xu Yang, Yongliang Wu, Mingzhuo Yang, Haokun Chen, Xin Geng
Annual Conference on Neural Information Processing Systems.NeurIPS2023.
[Web]

Transforming Visual Scene Graphs to Image Captions
Xu Yang, Jiawei Peng, Zihua Wang, Haiyang Xu, Qinghao Ye, Chenliang Li, Ming Yan, Fei Huang, Zhangzikang Li, Yu Zhang
Association for Computational Linguistics.ACL 2023.
[Web]

Learning Trajectory-Word Alignments for Video-Language Tasks
Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang
International Conference on Computer Vision.ICCV 2023.
[Web]

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai
International Journal of Computer Vision, 1-19.IJCV 2023.
[Web]

Show, Deconfound and Tell: Image Captioning With Causal Inference
Bing Liu, Dong Wang, Xu Yang, Yong Zhou, Rui Yao, Zhiwen Shao, Jiaqi Zhao
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.CVPR 2022.
[Web]

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi, Xu Yang, Haiyang Xu, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.CVPR 2022.
[Web]

Image captioning with transformer and knowledge graph
Yu Zhang, Xinyu Shi, Siya Mi, Xu Yang
Pattern Recognition Letters 143, 43-49.PRL.
[Web]

Deconfounded image captioning: A causal retrospect
Xu Yang, Hanwang Zhang, Jianfei Cai
IEEE Transactions on Pattern Analysis and Machine Intelligence.TPAMI.
[Web]

Auto-encoding and Distilling Scene Graphs for Image Captioning
Xu Yang, Hanwang Zhang, Jianfei Cai
IEEE Transactions on Pattern Analysis and Machine Intelligence.TPAMI.
[Web]

Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai
IEEE International Conference on Computer Vision. ICCV 2021.
[PDF]

Causal attention for vision-language tasks
Xu Yang, Hanwang Zhang, Guojun Qi, Jianfei Cai
Conference on Computer Vision and Pattern Recognition. CVPR 2021.
[PDF]

Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai
ACM International Conference on Multimedia. ACMMM 2020.
[Web]

Learning to collocate neural modules for image captioning
Xu Yang, Hanwang Zhang, Jianfei Cai
IEEE International Conference on Computer Vision. ICCV 2019.
[PDF]

Auto-encoding scene graphs for image captioning
Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai
Conference on Computer Vision and Pattern Recognition. CVPR 2019.
[PDF]Oral Presentation

Shuffle-then-assemble: learning object-agnostic visual relationship features
Xu Yang, Hanwang Zhang, Jianfei Cai
European Conference on Computer Vision. ECCV 2018.
[PDF]

Sparsity Conditional Energy Label Distribution Learning for Age Estimation
Xu Yang, Xin Geng, Deyu Zhou
International Joint Conference on Artificial Intelligence. IJCAI 2016.
[PDF]

Deep label distribution learning for apparent age estimation
Xu Yang, Bin-Bin Gao, Chao Xing, Zeng-Wei Huo, Xiu-Shen Wei, Ying Zhou, Jianxin Wu, Xin Geng
IEEE International Conference on Computer Vision Workshops. ICCVW 2015.
[PDF]

Misc

When I have some available time, I usually read, swim, and run. I have ubiquitous interest on different topics of the books, including Computer Science, Philosophy, History, Politics, Literature, and Detective Fiction.

Some recommended books:

Research: 《How To Read a Book》《Style, Toward Clarity and Grace》《The Craft of Research》
Philosophy: 《The Book of Why》《The Structure of Scientific Revolutions》《The Big Questions: A Short Introduction to Philosophy》
Literature: 《百年孤独》《战争与和平》《三国演义》