-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Hello,
First of all, thank you for releasing SAM2 and the excellent implementation.
While profiling preprocessing performance for large-resolution inputs, we observed a performance difference related to the ordering of ToTensor and Resize in the preprocessing pipeline.
Current implementation (SAM2-style) in sam2_image_predictor.py:
ToTensor → Resize → Normalizewhere Resize and Normalize are wrapped in a TorchScripted nn.Sequential, and forward_batch applies them inside a Python loop.
Alternative approach:
PIL Resize (uint8) → ToTensor → NormalizeTest Environment
- Input resolutions tested:
- 1920×1080
- 3840×2160
- 4096×2048
- Batch size: 8
- Target resolution: 1024×1024
- Device: CUDA
- Preprocessing performed on CPU
- PyTorch: 2.8.0+cu126, torchvision: 0.23.0+cu126
Benchmark Results
3840×2160 → 1024×1024 (batch=8)
| Pipeline | Avg Time | Python Peak Memory |
|---|---|---|
| PIL Resize → ToTensor → Normalize | 441 ms | ~6 MB |
| ToTensor → Torch Resize → Normalize (current style) | 492 ms | ~48 MB |
4096×2048 → 1024×1024 (batch=8)
| Pipeline | Avg Time | Python Peak Memory |
|---|---|---|
| PIL Resize → ToTensor → Normalize | 418 ms | ~6 MB |
| ToTensor → Torch Resize → Normalize (current style) | 495 ms | ~48 MB |
For smaller inputs (1920×1080), the current SAM2-style pipeline was slightly faster, but for large inputs (4K-level), pre-resizing in uint8 provided:
- Lower CPU memory peak
- Faster overall preprocessing time
GPU peak memory remained similar in our setup because resizing was done on CPU before transferring the batch to CUDA.
Question
We understand that placing ToTensor before Resize may simplify TorchScript compatibility for the transform module.
- Could you clarify whether the current ordering is primarily motivated by TorchScript constraints or deployment consistency?
- Would it make sense to optionally allow resizing in uint8 before tensor conversion for large-resolution CPU preprocessing scenarios?
We would appreciate any insights on the design rationale.
Thank you again for your work.