Fan Zhang - 张帆

Biography

I am currently a Researcher at Beijing Academy of Artificial Intelligence (BAAI), where I mainly focus on multimodal foundation models and AIGC. I earned my master’s degree at Institute of Software, Chinese Academy of Sciences, under the guidance of Prof. Feifei Ma and Prof. Jian Zhang.

News

  • [01/2026] 🎉🎉🎉 Emu3 has been accepted by Nature!
  • [01/2026] Our paper URSA has been accepted by ICLR 2026!
  • [10/2025] We are thrilled to release Emu3.5!
  • [09/2025] Our paper ETT has been accepted by NeurIPS 2025!

Selected Publications

Emu3.5
Emu3.5: Native Multimodal Models are World Learners
Yufeng Cui*, Honghao Chen*, Haoge Deng*, Xu Huang*, Xinghang Li*, Jirong Liu*, Yang Liu*, Zhuoyan Luo*, Jinsheng Wang*, Wenxuan Wang*, Yueze Wang*, Chengyuan Wang*, Fan Zhang*, Yingli Zhao*, Ting Pan, Xianduo Li, Zecheng Hao, Wenxuan Ma, Zhuo Chen, Yulong Ao, Tiejun Huang, Zhongyuan Wang, Xinlong Wang
URSA
Uniform Discrete Diffusion with Metric Path for Video Generation
Haoge Deng*, Ting Pan*, Fan Zhang*, Yang Liu*, Zhuoyan Luo, Yufeng Cui, Wenxuan Wang, Chunhua Shen, Shiguang Shan, Zhaoxiang Zhang, Xinlong Wang
ETT
End-to-End Vision Tokenizer Tuning
Wenxuan Wang*, Fan Zhang*, Yufeng Cui*, Haiwen Diao*, Zhuoyan Luo, Huchuan Lu, Jing Liu, Xinlong Wang
Emu3
Emu3: Next-token prediction is all you need
Xinlong Wang*, Xiaosong Zhang*, Zhengxiong Luo*, Quan Sun*, Yufeng Cui*, Jinsheng Wang*, Fan Zhang*, Yueze Wang*, Zhen Li*, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li, Boya Wu, Bo Zhao, Bowen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, Zhongyuan Wang
DIVA
Diffusion feedback helps clip see better
Wenxuan Wang*, Quan Sun*, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang
DenseFusion
Densefusion-1m: Merging vision experts for comprehensive multimodal perception
Xiaotong Li*, Fan Zhang*, Haiwen Diao*, Yueze Wang, Xinlong Wang, Ling-Yu Duan
EVACLIP
Eva-clip-18b: Scaling clip to 18 billion parameters
Quan Sun*, Jinsheng Wang*, Qiying Yu*, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang
Emu2
Generative multimodal models are in-context learners
Quan Sun*, Yufeng Cui*, Xiaosong Zhang*, Fan Zhang*, Qiying Yu*, Yueze Wang*, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang
CapsFusion
Capsfusion: Rethinking image-text data at scale
Qiying Yu*, Quan Sun*, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu
Emu1
Emu: Generative pretraining in multimodality
Quan Sun*, Qiying Yu*, Yufeng Cui*, Fan Zhang*, Xiaosong Zhang*, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang
AcfNet
Acfnet: Attentional class feature network for semantic segmentation
Fan Zhang, Yanqin Chen, Zhihang Li, Zhibin Hong, Jingtuo Liu, Feifei Ma, Junyu Han, Errui Ding

Honors & Awards

  • CVPR WAD Video Segmentation Challenge 4/145, 2018
  • “Star of Tomorrow” Internship Award of Excellence, MSRA, 2017
  • Chinese Collegiate Programming Contest (CCPC) Hefei Site, Bronze Medal, 2016

Service

  • Conference Reviewer: ICLR, NeurIPS, AAAI, SIGGRAPH Asia