Logo SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Shuangrui Ding1 Rui Qian1 Xiaoyi Dong2 Pan Zhang2
Yuhang Zang2 Yuhang Cao2 Yuwei Guo1 Dahua Lin1 Jiaqi Wang2
1CUHK MMLab  2Shanghai AI Lab  
[Code]      [arxiv]      [PDF]

Teaser Figure
Fig 1: Comparison of occlusion handling and long-term compatibility between SAM 2 and SAM2Long.

Abstract

SAM 2's greedy-selection memory design suffers from the "error accumulation" problem, where an errored or missed mask will cascade and influence the segmentation of the subsequent frames, which limits the performance of SAM 2 toward complex long-term videos. To this end, we introduce SAM2Long, an improved training-free video object segmentation strategy, that considers segmentation uncertainty and selects optimal video-level results through a constrained tree search. SAM2Long maintains multiple segmentation pathways, selecting branches with higher cumulative scores at each frame. This heuristic design ensures robustness against occlusions and object reappearances. Without additional parameters or further training, SAM2Long significantly outperforms SAM 2 on six VOS benchmarks, achieving an average improvement of 3.0 points and up to 5.3 points in J&F across all 24 head-to-head comparisons on long-term segmentation benchmarks SA-V and LVOS.


Demo Video



Method Pipeline

Pipeline
Fig 2: (a) The pipeline maintains multiple memory pathways, selecting the highest-scoring masks at each step. (b) Masks are chosen based on certainty; if uncertain, diverse masks are selected to avoid errors.


Main Result

performance
SAM2Long consistently improves SAM2 over all model sizes and datasets.


Publications

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang
preprint, 2024


Webpage template modified from Richard Zhang.