Boqiang Zhang 张博强

Senior Research Engineer at Tencent AI Lab

Boqiang Zhang's photo

About Me

I hold a Master's degree from the University of Science and Technology of China (USTC), where I worked under the guidance of Prof. Hongtao Xie. I completed my Bachelor's degree at Northwestern Polytechnical University (NWPU).

My current research interests focus on vision-language-action (VLA) models, vision language models (VLM) and unified understanding and generation. Earlier in my career, I concentrated on scene text recognition and editing, exploring both self-supervised and semi-supervised learning approaches.

Publications & Preprints

Penguin-VL
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Boqiang Zhang, Lei Ke, Ruihan Yang, Qi Gao, Tianyuan Qu, Rossell Chen, Dong Yu, Leoweiliang
ArXiv, 2026
N3D-VLM
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
Yuxin Wang, Lei Ke, Boqiang Zhang, Tianyuan Qu, Hanxun Yu, Zhenpeng Huang, Meng Yu, Dan Xu, Dong Yu
ArXiv, 2026
VideoLLaMA 3
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
Boqiang Zhang, Kehan Li, Zesen Cheng, Zhiqiang Hu, Yuqian Yuan, Guanzheng Chen, Sicong Leng, Yuming Jiang, Hang Zhang, Xin Li, Peng Jin, Wenqi Zhang, Fan Wang, Lidong Bing, Deli Zhao
ArXiv, 2025
MMR1
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Sicong Leng, Jing Wang, Jiaxi Li, Hao Zhang, Zhiqiang Hu, Boqiang Zhang, Hang Zhang, Yuming Jiang, Xin Li, Deli Zhao, Fan Wang, Yu Rong, Aixin Sun, Shijian Lu
CVPR, 2026
VideoRefer
Videorefer suite: Advancing spatial-temporal object understanding with video llm
Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing
CVPR, 2025
TextGen
How Control Information Influences Multilingual Text Image Generation and Editing?
Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie*
NeurIPS, 2024
CVPR 2024 paper
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing
Boqiang Zhang, Hongtao Xie*, Zuan Gao, Yuxin Wang
CVPR, 2024
LPV paper
Linguistic more: Taking a further step toward efficient and accurate scene text recognition
Boqiang Zhang, Hongtao Xie*, Yuxin Wang, Jianjun Xu, Yongdong Zhang
IJCAI, 2023
CLIPSTR paper
Symmetrical linguistic feature distillation with clip for scene text recognition
Zixiao Wang, Hongtao Xie*, Yuxin Wang, Jianjun Xu, Boqiang Zhang, Yongdong Zhang
ACM MM, 2023
SSM paper
Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Zuan Gao, Yuxin Wang*, Yadong Qu, Boqiang Zhang, Zixiao Wang, Jianjun Xu, Hongtao Xie
IJCAI, 2024
I2CL paper
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
Bangbang Zhou, Yadong Qu, Zixiao Wang, Zicheng Li, Boqiang Zhang, Hongtao Xie*
IJCAI, 2024

Working Experience

  • Tencent AI Lab Logo
    Tencent AI Lab | ShenZhen | Jul. 2025 - Present
    Senior Research Engineer
    Topic: Multi-modal Large Language Model, Image/Video Understanding and Generation
  • Alibaba DAMO Academy Logo
    Alibaba DAMO Academy | Hangzhou | Jun. 2024 - Jul. 2025
    Research Intern
    Topic: Multi-modal Large Language Model, Image/Video Understanding, Embodied AI

Services

  • Conference Reviewer: NeurIPS, ACM MM, ICLR, ICML, TMM

Honors

  • Outstanding Graduate of USTC and Province Anhui, 2025
  • HuaWei Scholarship, 2023
  • Outstanding Graduate of NWPU, 2022 (top 5%)
  • National Scholarship, 2024, 2021, 2020, 2019
  • Outstanding Student of NWPU, 2020 (top 1%)