CUHK MMLab · Meta Superintelligence Labs

Video foundation models, grounded multimodal agents, and real-world AI evaluation.

I am a final-year Ph.D. candidate at CUHK MMLab, advised by Prof. Dahua Lin. My research centers on vision-language models, multimodal agents, and long-horizon agent evaluation.

I am currently a Research Scientist Intern at Meta Superintelligence Labs, working on video grounding with Jie Lei. Previously, I worked on SAM3 with Nicolas Carion at Meta and on multimodal LLMs at Shanghai AI Laboratory with Xiaoyi Dong and Jiaqi Wang.

CV Scholar GitHub Email

Expected graduation: Summer 2027. Open to Research Scientist opportunities in multimodal AI, video VLMs, and agent evaluation.

2,211 Google Scholar citations

19 h-index

10 first / co-first author papers

ICLR · CVPR · ICCV plus NeurIPS, ICML, ECCV, ACL

Citation snapshot: June 9, 2026.

News

Latest updates

May 2026

Released WildClawBench, a real-world long-horizon agent benchmark with 60 human-authored multimodal tasks.

May 2026

SetCon is online, introducing set-level concept prediction for open-ended referring segmentation.

Jan 2026

Three papers accepted at ICLR 2026: SAM3, SeC, and ScaleCap.

Nov 2025

Released SAM3 for concept-level detection, segmentation, and tracking in images and videos.

Oct 2025

Keynote talk at the ICCV 2025 LSVOS Workshop in Honolulu, Hawaii.

Jun 2025

SAM2Long accepted to ICCV 2025.

Publications

Selected papers

* equal contribution · † project lead

SAM3: Segment Anything with Concepts

Meta SAM3 Team, incl. Shuangrui Ding. ICLR 2026.

Project arXiv HF Code

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

Shuangrui Ding*†, Xuanlang Dai*, Long Xing*, et al. arXiv 2026.

Leaderboard arXiv Code

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang. ICCV 2025.

Project arXiv Code

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition

Shuangrui Ding*, Zihan Liu*, Xiaoyi Dong, Pan Zhang, Rui Qian, Junhao Huang, Conghui He, Dahua Lin, Jiaqi Wang. ACL 2025 Main.

arXiv Code Demo

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian*, Shuangrui Ding*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang. CVPR 2025.

arXiv Code

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Zhixiong Zhang*, Shuangrui Ding*, Xiaoyi Dong, Songxin He, Jianfan Lin, Junsong Tang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang. ICLR 2026.

Project arXiv Code

Selected earlier work

Streaming Long Video Understanding with LLMs · NeurIPS 2024 / Betrayed by Attention · ECCV 2024 / STA · ICCV 2023 / DCLR · ACM MM 2022 / FAME · CVPR 2022 / Practical Attacks on GNNs · NeurIPS 2020.

Background

Experience and education

May 2026 - Oct 2026

Meta Superintelligence Labs, Bellevue

Research Scientist Intern on vision-language models for video grounding, supervised by Jie Lei.

May 2025 - Oct 2025

Meta Superintelligence Labs, London

Research Scientist Intern on SAM3: Segment Anything with Concepts, focusing on multimodal interactivity and video grounding; supervised by Nicolas Carion.

2023 - Present

The Chinese University of Hong Kong

Ph.D. in Information Engineering at MMLab, advised by Prof. Dahua Lin.

2023 - 2025

Shanghai AI Laboratory

Worked on LLMs and advanced video understanding with Xiaoyi Dong and Jiaqi Wang.

2021 - 2023

Shanghai Jiao Tong University

M.S. in Information and Communication Engineering. Graduate National Scholarship awardee.

2019 - 2021

University of Michigan

B.S.E. in Computer Science. Summa Cum Laude; GPA: 3.9/4.0.

2017 - 2019

Shanghai Jiao Tong University

B.S.E. in Electrical and Computer Engineering. National Scholarship Awardee; GPA: 3.8/4.0.

Honors

Selected awards

CUHK Vice-Chancellor's Ph.D. Scholarship, 80,000 HKD, 2023
Graduate National Scholarship, Top 2%, 2022
Shanghai Outstanding Graduate, Top 5%, 2021
Mathematical Contest in Modeling Finalist, Top 0.3%, 2019
Undergraduate National Scholarship, Top 2%, 2018

Talks and service

Community

Keynote Speaker, ICCV 2025 LSVOS Workshop: From Pixels to Meaning.
Invited Speaker, InternLM Community Open Mic: When Songwriting Meets Large Models.
Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, AAAI, ACL, ACM MM.