Shuangrui Ding (丁双睿)

I am a second-year Ph.D. student in the Multi-Media Lab at The Chinese University of Hong Kong, supervised by Prof. Dahua Lin. My current research interest spans the large language model and video understanding.

I obtained my Master's degree in Electrical Engineering from Shanghai Jiao Tong University in 2023, where I was advised by Prof. Hongkai Xiong. Prior to that, I earned a Bachelor's degree in Computer Science from the University of Michigan, along with a dual degree in Electrical and Computer Engineering from Shanghai Jiao Tong University in 2021.

Email  /  CV  /  Google Scholar /  Github

profile photo
News

[Sept. 2024] One paper have been accepted at NeurIPS 2024.

[Jul. 2024] Three paper have been accepted at ECCV 2024.

[May. 2024] One paper have been accepted at ICML 2024 on low-bit integer training.

[March. 2024] We release the paper and the code of SongComposer, a large language model for song generation.

Preprint
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang
preprint, 2024
Project Page | arXiv | code | PDF

Outperform SAM 2 by a large margin through a training-free memory tree.

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
Shuangrui Ding*, Zihan Liu*, Xiaoyi Dong, Pan Zhang, Rui Qian, Conghui He, Dahua Lin, Jiaqi Wang
preprint, 2024
arXiv / code / invited talk / demo page

A language large model that understands and generates melodies and lyrics in symbolic song representations.

Selected Publications(* equal contribution)
Streaming Long Video Understanding with Large Language Models
Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Shuangrui Ding, Dahua Lin, Jiaqi Wang
NeurIPS, 2024
arXiv

Long video understanding with disentangled streaming video encoding and LLM reasoning.

Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian, Shuangrui Ding, Dahua Lin,
ECCV, 2024
arXiv

Efficiently adapt image foundation models to video domain in an object-centric manner.

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Shuangrui Ding*, Rui Qian*, Haohang Xu, Dahua Lin, Hongkai Xiong
ECCV, 2024
arXiv / code

Learn robust spatio-temporal corrependence on top of DINO-pretrained Transformer without any annotation.

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian
ICCV, 2023
Project page / arXiv / pdf / code / poster / slides

Propose token pruning strategy for video Transformers to offer a competitive speed-accuracy trade-off without additional training or parameters.

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
ICCV, 2023
arXiv / pdf / code / poster

Jointly utilizes high-level semantics and low-level temporal correspondence for object-centric learning in videos without any supervision.

Static and Dynamic Concepts for Self-supervised Video Representation Learning
Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
ECCV, 2022
arXiv / code / slide

Learn static and dynamic visual concepts in videos to aggregate local patterns with similar semantics to boost unsupervised video representation.

Dual Contrastive Learning for Spatio-temporal Representation
Shuangrui Ding, Rui Qian, Hongkai Xiong
ACM MM, 2022
arXiv / poster / video / code

Present a novel dual contrastive formulation to decouple the static/dynamic features and thus mitigate the background bias.

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding, Maomao Li, Tianyu Yang, Rui Qian, Haohang Xu, Qingyi Chen, Jue Wang, Hongkai Xiong
CVPR, 2022
Project page / arXiv / code / Chinese coverage / poster

Mitigate the background bias in self-supervised video representation learning via copy-pasting the foreground onto the other backgrounds.

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Rui Qian, Yuxi Li, Huabin Liu, John See, Shuangrui Ding, Xian Liu, Dian Li, Weiyao Lin
ICCV, 2021
arXiv / code

Self-supervised video representation learning from the perspective of both high-level semantics and lower-level characteristics

Towards More Practical Adversarial Attacks on Graph Neural Networks
Jiaqi Ma*, Shuangrui Ding*, Qiaozhu Mei
NeurIPS, 2020
arXiv / slides / video / code

Exploiting the structural inductive biases of GNNs, the restricted black-box adversarial attacks can be conducted effectively.

Awards

CUHK Vice-Chancellor's Ph.D. Scholarship (80,000 HKD), Graduate school of CUHK. 2023

Graduate National Scholarship (Top 2%), Ministry of Education of China. 2022

Shanghai Excellent Graduate (Top 5%), Shanghai Municipal Education Commission. 2021

Finalist winner (Top 0.3%), Mathematical Contest in Modeling. 2019

National Scholarship (Top 2%), Ministry of Education of China. 2018

Professional Services

  • Reviewer: ECCV’22/24, CVPR’23-24, ICCV’23, NeurIPS’23-24, ICLR’23-25, AAAI’23-25, ICML’24, ACM MM’23-24.
  • Misc

    1. My favorite sports is soccer. I was the captain of UM-SJTU JI soccer team during season 2018. Besides, I am a super fan of Manchester City in Premier League.

    2. I am proud that I have graudated from the competition class at Hangzhou No.2 High school, where I make friends with so many talented students and prestigious teachers.

    3. It is worth mentioning that Rui is my best friend and has motivated me forward for over ten years as my role model. Best wishes and good luck!



    Updated at Oct. 2024
    Thanks Jon Barron for this amazing template.