Update inference_cli.py Long videos and low RAM refactor #462

Will-I4M · 2025-12-31T10:46:07Z

Multi-GPU streaming refactor: added --recycle_workers_every and changed dispatch to process per-GPU sub-segments in cycles, respawning workers each cycle to release allocator memory.
Spill-based stitching: introduced disk spill per chunk and incremental stitcher that blends overlaps while streaming to a persistent writer (or PNGs) without holding full segments in RAM.
Output controls: ffmpeg writer now supports configurable codec/pix_fmt/bitrate/CRF/preset and higher bit depths; spill files stored in uint8/uint16 based on --output_bitdepth.
Resilience: added OOM retry with cleanup/backoff for chunk processing; disk saves use atomic writes with retries.
Diagnostics: RAM telemetry per chunk/phase, per-PID monitors for workers, and detailed logging around chunk processing, saving, and stitching.
Input handling: explicit freeing of input tensors (new_frames) between streamed chunks to reduce transient RAM.
CLI additions: --spill_dir, --output_bitdepth, --video_codec, --video_pix_fmt, --video_crf, --video_bitrate, --video_preset, --recycle_workers_every.
Minor safety: pre-parse CUDA device; auto-enable streaming for long videos when --chunk_size is unset to avoid full loads.

- Multi-GPU streaming refactor: added --recycle_workers_every and changed dispatch to process per-GPU sub-segments in cycles, respawning workers each cycle to release allocator memory. - Spill-based stitching: introduced disk spill per chunk and incremental stitcher that blends overlaps while streaming to a persistent writer (or PNGs) without holding full segments in RAM. - Output controls: ffmpeg writer now supports configurable codec/pix_fmt/bitrate/CRF/preset and higher bit depths; spill files stored in uint8/uint16 based on --output_bitdepth. - Resilience: added OOM retry with cleanup/backoff for chunk processing; disk saves use atomic writes with retries. - Diagnostics: RAM telemetry per chunk/phase, per-PID monitors for workers, and detailed logging around chunk processing, saving, and stitching. - Input handling: explicit freeing of input tensors (new_frames) between streamed chunks to reduce transient RAM. - CLI additions: --spill_dir, --output_bitdepth, --video_codec, --video_pix_fmt, --video_crf, --video_bitrate, --video_preset, --recycle_workers_every. - Minor safety: pre-parse CUDA device; auto-enable streaming for long videos when --chunk_size is unset to avoid full loads.

adrientoupet · 2026-01-01T09:14:19Z

Thank you for the PR @Will-I4M - there is quite a lot of changes. Did you test it with single and multi gpu? Including on long videos? Anything in the code update I need to consider when doing the review?

I'll try to get to it next week, I'm unavailable this week. Thanks again.

Will-I4M · 2026-01-01T18:18:53Z

Thank you for the PR @Will-I4M - there is quite a lot of changes. Did you test it with single and multi gpu? Including on long videos? Anything in the code update I need to consider when doing the review?

I'll try to get to it next week, I'm unavailable this week. Thanks again.

Indeed, my apologies for pushing several feature changes all at once. Perhaps it would be wiser to split the inference_cli into two parts to potentially keep some common sections and specialize one version in low RAM usage. In this PR, without using --debug, the code displays several messages about RAM usage, which isn't necessarily desirable for everyone, even though I find it very useful given SeedVR2's high RAM and VRAM requirements. I tested it in single-GPU and multi-GPU configurations. My main focus in this PR was RAM usage. There's generally better resilience to VRAM-related OOMs, but that's not the biggest change. In the initial version, all chunks are kept in RAM after processing until the final step: that's the biggest change, because in this version, only what's necessary (stitching, etc.) is kept in RAM. In the initial version, in multi-GPU mode, if a single process crashed during calculation (out of order...), the others would still continue unnecessarily until the end. The code is now more resilient and retries after somewhat aggressive memory cleanup attempts. I tested several strategies to free up RAM, all of which proved unsuccessful, until this latest version: the system no longer swaps unnecessarily. I was able to process several hours of video using the 7B model (fp16/4k/21 frames of context) on 4x3090 in a reasonable time, something I couldn't do before (even for durations of 25 minutes), so I think this work can be useful to the SeedVR2 user community.

shodan5000 · 2026-01-02T02:26:50Z

Pardon my ignorance, is this something that would, or could, benefit the ComfyUI version as well?

Will-I4M · 2026-01-05T13:14:32Z

Pardon my ignorance, is this something that would, or could, benefit the ComfyUI version as well?

Not with this PR. But it could be usefull to refactor the comfyUI version as well

cd1188 · 2026-01-10T21:10:15Z

Hi.
If I'm using chunk-streaming with "--video_backend ffmpeg" it always hangs on the last chunk without error...
I guess there is a problem on the last write in the output file.
If I remove the "--video_backend" switch, it works, but opencv is then using h263 ?!
The standard inference_cli.py from seedvr2.5.24 is working fine with ffmpeg and --chunk_size 44
Don't ask why 44 .. its my workflow :)

BTW.. your file has still the bug about "--prepend_frames" not working.
Link is issue-472

thehhmdb · 2026-01-17T18:02:53Z

I have the same problem as above with --video_backend ffmpeg hanging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update inference_cli.py Long videos and low RAM refactor #462

Update inference_cli.py Long videos and low RAM refactor #462

Will-I4M commented Dec 31, 2025

Uh oh!

adrientoupet commented Jan 1, 2026

Uh oh!

Will-I4M commented Jan 1, 2026

Uh oh!

shodan5000 commented Jan 2, 2026

Uh oh!

Will-I4M commented Jan 5, 2026 •

edited

Loading

Uh oh!

cd1188 commented Jan 10, 2026 •

edited

Loading

Uh oh!

thehhmdb commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Update inference_cli.py Long videos and low RAM refactor #462

Are you sure you want to change the base?

Update inference_cli.py Long videos and low RAM refactor #462

Conversation

Will-I4M commented Dec 31, 2025

Uh oh!

adrientoupet commented Jan 1, 2026

Uh oh!

Will-I4M commented Jan 1, 2026

Uh oh!

shodan5000 commented Jan 2, 2026

Uh oh!

Will-I4M commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cd1188 commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thehhmdb commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Will-I4M commented Jan 5, 2026 •

edited

Loading

cd1188 commented Jan 10, 2026 •

edited

Loading