Hi, thanks for releasing this great work.
I’m trying to reproduce the “blind demo” style inference, where the model processes a continuous video stream and outputs responses in an online manner (instead of reading the whole video first).
However, I cannot find a clear example or documentation for:
Whether the released code supports real-time output, or if the current pipeline only supports offline batch inference
Questions:
Is there a script or example for online inference like the blind demo?
If not, is there a recommended way to modify the existing evaluation script to produce streaming outputs?
Any guidance or pointer to the correct script would be very helpful.
Thanks!