Skip to content

Conversation

@Will-I4M
Copy link

  • Multi-GPU streaming refactor: added --recycle_workers_every and changed dispatch to process per-GPU sub-segments in cycles, respawning workers each cycle to release allocator memory.
  • Spill-based stitching: introduced disk spill per chunk and incremental stitcher that blends overlaps while streaming to a persistent writer (or PNGs) without holding full segments in RAM.
  • Output controls: ffmpeg writer now supports configurable codec/pix_fmt/bitrate/CRF/preset and higher bit depths; spill files stored in uint8/uint16 based on --output_bitdepth.
  • Resilience: added OOM retry with cleanup/backoff for chunk processing; disk saves use atomic writes with retries.
  • Diagnostics: RAM telemetry per chunk/phase, per-PID monitors for workers, and detailed logging around chunk processing, saving, and stitching.
  • Input handling: explicit freeing of input tensors (new_frames) between streamed chunks to reduce transient RAM.
  • CLI additions: --spill_dir, --output_bitdepth, --video_codec, --video_pix_fmt, --video_crf, --video_bitrate, --video_preset, --recycle_workers_every.
  • Minor safety: pre-parse CUDA device; auto-enable streaming for long videos when --chunk_size is unset to avoid full loads.

- Multi-GPU streaming refactor: added --recycle_workers_every and changed dispatch to process per-GPU sub-segments in cycles, respawning workers each cycle to release allocator memory.
- Spill-based stitching: introduced disk spill per chunk and incremental stitcher that blends overlaps while streaming to a persistent writer (or PNGs) without holding full segments in RAM.
- Output controls: ffmpeg writer now supports configurable codec/pix_fmt/bitrate/CRF/preset and higher bit depths; spill files stored in uint8/uint16 based on --output_bitdepth.
- Resilience: added OOM retry with cleanup/backoff for chunk processing; disk saves use atomic writes with retries.
- Diagnostics: RAM telemetry per chunk/phase, per-PID monitors for workers, and detailed logging around chunk processing, saving, and stitching.
- Input handling: explicit freeing of input tensors (new_frames) between streamed chunks to reduce transient RAM.
- CLI additions: --spill_dir, --output_bitdepth, --video_codec, --video_pix_fmt, --video_crf, --video_bitrate, --video_preset, --recycle_workers_every.
- Minor safety: pre-parse CUDA device; auto-enable streaming for long videos when --chunk_size is unset to avoid full loads.
@adrientoupet
Copy link
Collaborator

Thank you for the PR @Will-I4M - there is quite a lot of changes. Did you test it with single and multi gpu? Including on long videos? Anything in the code update I need to consider when doing the review?

I'll try to get to it next week, I'm unavailable this week. Thanks again.

@Will-I4M
Copy link
Author

Will-I4M commented Jan 1, 2026

Thank you for the PR @Will-I4M - there is quite a lot of changes. Did you test it with single and multi gpu? Including on long videos? Anything in the code update I need to consider when doing the review?

I'll try to get to it next week, I'm unavailable this week. Thanks again.

Indeed, my apologies for pushing several feature changes all at once. Perhaps it would be wiser to split the inference_cli into two parts to potentially keep some common sections and specialize one version in low RAM usage. In this PR, without using --debug, the code displays several messages about RAM usage, which isn't necessarily desirable for everyone, even though I find it very useful given SeedVR2's high RAM and VRAM requirements. I tested it in single-GPU and multi-GPU configurations. My main focus in this PR was RAM usage. There's generally better resilience to VRAM-related OOMs, but that's not the biggest change. In the initial version, all chunks are kept in RAM after processing until the final step: that's the biggest change, because in this version, only what's necessary (stitching, etc.) is kept in RAM. In the initial version, in multi-GPU mode, if a single process crashed during calculation (out of order...), the others would still continue unnecessarily until the end. The code is now more resilient and retries after somewhat aggressive memory cleanup attempts. I tested several strategies to free up RAM, all of which proved unsuccessful, until this latest version: the system no longer swaps unnecessarily. I was able to process several hours of video using the 7B model (fp16/4k/21 frames of context) on 4x3090 in a reasonable time, something I couldn't do before (even for durations of 25 minutes), so I think this work can be useful to the SeedVR2 user community.

@shodan5000
Copy link

Pardon my ignorance, is this something that would, or could, benefit the ComfyUI version as well?

@Will-I4M
Copy link
Author

Will-I4M commented Jan 5, 2026

Pardon my ignorance, is this something that would, or could, benefit the ComfyUI version as well?

  • Not with this PR. But it could be usefull to refactor the comfyUI version as well

@cd1188
Copy link

cd1188 commented Jan 10, 2026

Hi.
If I'm using chunk-streaming with "--video_backend ffmpeg" it always hangs on the last chunk without error...
I guess there is a problem on the last write in the output file.
If I remove the "--video_backend" switch, it works, but opencv is then using h263 ?!
The standard inference_cli.py from seedvr2.5.24 is working fine with ffmpeg and --chunk_size 44
Don't ask why 44 .. its my workflow :)

BTW.. your file has still the bug about "--prepend_frames" not working.
Link is issue-472

@thehhmdb
Copy link
Contributor

I have the same problem as above with --video_backend ffmpeg hanging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants