ICCV 2025 · Training-free long video segmentation

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

SAM2Long improves SAM 2 on complex long videos by keeping multiple memory pathways and selecting robust video-level results through constrained tree search.

FAST and FURIOUS (2001) street race: long-range occlusion and reappearance stress test.
Training-free No extra parameters or finetuning.
+3.0 J&F Average gain over SAM 2 across six VOS benchmarks.
+5.3 J&F Maximum gain on long-term segmentation benchmarks.
24 / 24 Head-to-head comparisons improved on SA-V and LVOS.

Overview

Why SAM2Long?

Comparison of SAM 2 and SAM2Long on long-term occlusion handling
Comparison of occlusion handling and long-term compatibility between SAM 2 and SAM2Long.

SAM 2's greedy memory selection can accumulate errors: a missed or incorrect mask may influence subsequent frames and become difficult to recover from. SAM2Long keeps several plausible segmentation pathways, scores them over time, and selects branches that remain reliable across long videos.

Demos

Long-video cases

APT official music video by ROSÉ and Bruno Mars.
Pink Venom dance practice video.
A long take from Touch of Evil (1958).
A Quidditch match from Harry Potter and the Philosopher's Stone (2001).
Project demo video.

Method

Memory tree search

SAM2Long method pipeline
SAM2Long maintains multiple memory pathways, selects high-scoring masks at each step, and preserves diverse branches when predictions are uncertain.

The method treats long video segmentation as a video-level decision problem rather than a frame-by-frame greedy update. By balancing certainty and diversity, the memory tree can recover from temporary ambiguity, occlusion, and object reappearance without retraining SAM 2.

Results

Consistent gains across settings

SAM2Long performance across datasets and model sizes
SAM2Long consistently improves SAM 2 across model sizes and datasets.

Paper

Citation

First page of the SAM2Long paper

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang. ICCV 2025.

BibTeX
@inproceedings{ding2025sam2long,
  title={SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree},
  author={Ding, Shuangrui and Qian, Rui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Guo, Yuwei and Lin, Dahua and Wang, Jiaqi},
  booktitle={ICCV},
  year={2025}
}