-
Notifications
You must be signed in to change notification settings - Fork 82
Add FPS to get_frames_played_in_range()
#1148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| std::to_string(maxSeconds.value()) + ")."); | ||
| } | ||
|
|
||
| // Resample frames to match the target frame rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add an early break if requested fps is the same as the current fps. Not sure if it is a necessary use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's handle that with a follow-up. Because we'll need to do equality on a double, there may be some subtle things to handle.
| double startSeconds, | ||
| double stopSeconds) { | ||
| double stopSeconds, | ||
| std::optional<double> fps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for SingleStreamDecoder to not take optionals, and for the bridge code in custom_ops.cpp to take care of optional logic. But, there are other APIs in SingleStreamDecoder which do take optionals, I'll invite @Dan-Flores and @NicolasHug to make a call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems natural to me to have an optional here. Letting custom_ops.cpp be the bridge, I think, would mean that we'd have to create more private method to split the implementation (for e.g. the common checks at the beginning, and the common logic at the end).
| fps.value() > 0, | ||
| "fps must be positive, got " + std::to_string(fps.value())); | ||
|
|
||
| double fpsVal = static_cast<double>(fps.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no need to do a cast, as fps.value() returns a double.
| // determine the frame count, which we replicate here for consistency. | ||
| // Examples: | ||
| // - 35.23 fps, 1.0s: round(35.23) = 35 frames | ||
| // - 35.87 fps, 1.0s: round(35.87) = 36 frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put a link to the resource you learned the exact techniques that the fps filter is using? I assume it's the source code, in which case a stable link to that code would be great. We want to make it easy for others to make changes to this code, and they may need to look at that as well.
| videoStreamOptions.device); | ||
| frameBatchOutput.data = maybePermuteHWC2CHW(frameBatchOutput.data); | ||
| return frameBatchOutput; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ever valid for numOutputFrames < 0? That is, we ensure that startSeconds <= stopSeconds at the very top, and we also ensure that fpsVal > 0. So I think that if numOutputFrames < 0, that is an error.
And in the case that numOutputFrames == 0, I don't think we actually need to special case it. I think the remaining logic should be robust to that case.
| if (outputIdx >= 0) { | ||
| sourceFrameIndices[outputIdx] = j; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mapping logic will need to be seek mode aware. In approximate mode, streamInfo.allFrames won't be available and we'll need to do math to calculate the value. Check out secondsToIndexLowerBound() for inspiration.
We'll also need to test exact and approximate seek modes. :)
| with pytest.raises(ValueError, match="Invalid stop seconds"): | ||
| frame = decoder.get_frames_played_in_range(0, 23) # noqa | ||
|
|
||
| @pytest.mark.parametrize("device", all_supported_devices()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also parameterize on seek mode - I think we can skip custom and just do "exact" and "approximate". Based on the code above, I think "approximate" will currently fail.
| with pytest.raises(RuntimeError, match="fps must be positive"): | ||
| decoder.get_frames_played_in_range(start_seconds, stop_seconds, fps=-10) | ||
|
|
||
| @needs_ffmpeg_cli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also parametrize this test on seek mode.
| ) | ||
|
|
||
| source_fps = decoder.metadata.average_fps | ||
| duration_seconds = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need some tests that extract a longer duration. In particular, we'll want to make sure we have tests that decode the entire video. I suspect we'll hit some edge-cases in our dropping and duping logic when we decode all frames.
|
|
||
| def get_frames_played_in_range( | ||
| self, start_seconds: float, stop_seconds: float | ||
| self, start_seconds: float, stop_seconds: float, fps: float | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expose the fps functionality in this method, then for convenience I think we should make sure to also expose a get_all_frames(fps) method, so that users don't have to manually specify both start and stop. (Exposing this get_all_frames() method is something we should do regardless).
| for (int64_t i = 0; i < numOutputFrames; ++i) { | ||
| double targetPts = startSeconds + i * frameDuration; | ||
| targetTimestamps.push_back(targetPts); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic looks a lot like what we already have in the time-based samplers with the seconds_between_frames parameter:
torchcodec/src/torchcodec/samplers/_time_based.py
Lines 94 to 101 in 50686bb
| # The frames of a clip will be sampled at the following pts: | |
| # clip_timestamps = [ | |
| # clip_start + 0 * seconds_between_frames, | |
| # clip_start + 1 * seconds_between_frames, | |
| # clip_start + 2 * seconds_between_frames, | |
| # ... | |
| # clip_start + (num_frames_per_clip - 1) * seconds_between_frames, | |
| # ] |
I could wrong, but I suspect there is some overlap between the new fps parameter and the existing seconds_between_frames parameter, in terms of functionality. It's been a while since I implemented the latter, so I don't fully remember the details and how exactly it behaves (does it also drop and duplicate frames in the same way? I'm not sure).
Generally, a good design principle is that There should be one-- and preferably only one --obvious way to do it.
I think it might be worth resolving this ambiguity - @mollyxu, can I ask you to do a bit of investigation and figure out the main behavior differences between fps and seconds_between_frames? Once we know how they differ in terms of behavior, it'll be easier to figure out a correct, unambiguous and holistic API for fps output.
| // Calculate the exact number of output frames for the half-open range | ||
| // [startSeconds, stopSeconds). FFmpeg's fps filter drops the frames until | ||
| // the next frame would overshoot. | ||
| // https://github.com/FFmpeg/FFmpeg/blob/n7.1/libavfilter/vf_fps.c#L290-L291 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this https://github.com/FFmpeg/FFmpeg/blob/b08d7969c550a804a59511c7b83f2dd8cc0499b8/libavfilter/vf_fps.c#L290-L291 instead. That is, we want a permalink which includes the commit hash in the URL. That will never change, even as the code is changed on main. As the code on main changes, lines 290 and 291 will eventually be different code.
(On GitHub, to get a permalink, click the three dots next to a line number, and it brings up a dialog box you can use to copy the permalink.)
Add FPS to
get_frames_played_in_range()Context: #1133
This PR adds support for specifying a target frame rate when calling
get_frames_played_in_range(). Whentarget_fpsis provided, the method resamples the video to output frames at evenly-spaced timestamps matching the target FPS, mimicking FFmpeg fps filter behavior.The way to read this API if fps is:
None, default: returns decoded frames betweenstart_secondsandstop_secondsat the input frame rate. This is the existing behavior.float: returns decoded frames betweenstart_secondsandstop_secondswhere the output frame rate matches the value offps.Implementation
The resampling algorithm replicates FFmpeg's
fpsfilter logic:round((stop - start) * target_fps)to determine output frame count, matching FFmpeg'sAV_ROUND_NEAR_INFrounding behavior.output_index = round((pts - start) * target_fps). Multiple source frames mapping to the same output index results in the last one being used.