Skip to content

Conversation

Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Sep 30, 2025

This PR updates the VideoEncoder to have round-trip encoding/decoding capabilities for 6 common video container formats:
mov, mp4, avi, mkv, webm, flv.
gif is also tested against the CLI, but not round trip due to only having 8-bits per pixel.

Changes:

Some changes are made to align with the design in #907:

Testing

  • test_video_encoder_round_trip: Ensures that a video's decoded frames are the same after encoding then decoding.
  • test_video_encoder_against_ffmpeg_cli: Ensures that the VideoEncoder frames are the same as the FFmpeg CLI.

Testing caveats

  • qscale is substituted in for crf when older codecs are used.
    • Reusing the crf parameter is not ideal, but crf and scale are (only) needed for testing lossless encoding, not as an option from the Python API. I'm open to other ideas or parameters to test correctness.
  • When lossy encoding occurs due to codec + pixel format selection, assert_close is substituted by assert_tensor_close_on_at_least with a percentage (97-99), and a higher atol is used (2-15).

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 30, 2025
@Dan-Flores Dan-Flores force-pushed the encoder_accuracy branch 2 times, most recently from f8d03ad to f414d0b Compare October 4, 2025 18:53
@Dan-Flores Dan-Flores marked this pull request as ready for review October 7, 2025 14:51
avCodecContext_->global_quality = FF_QP2LAMBDA * qp;
}
int status = avcodec_open2(avCodecContext_.get(), avCodec, &options);
av_dict_free(&options);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, the crf parameter is reused to set qscale to encode high quality videos in round-trip tests. But, the C++ function only allows crf to be set, not qscale. Since qscale is not needed anywhere else, I did not think it was worth including, but I am open to feedback here.

Comment on lines +1367 to +1372
assert (
source_frames.shape == round_trip_frames.shape
), f"Shape mismatch: source {source_frames.shape} vs round_trip {round_trip_frames.shape}"
assert (
source_frames.dtype == round_trip_frames.dtype
), f"Dtype mismatch: source {source_frames.dtype} vs round_trip {round_trip_frames.dtype}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use plain assert like

assert source_frames.shape == round_trip_frames.shape

pytest will provide a proper error message

Comment on lines +1381 to 1385
atol = 2
# Check that PSNR for decode(encode(samples)) is above 30
for s_frame, rt_frame in zip(source_frames, round_trip_frames):
res = psnr(s_frame, rt_frame)
assert res > 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it's not from this PR but let's clean that up a little:

  • the comment isn't needed as the code is really self explanatory (and it doesn't just do psnr validation anymore!)
  • no need to store res
Suggested change
atol = 2
# Check that PSNR for decode(encode(samples)) is above 30
for s_frame, rt_frame in zip(source_frames, round_trip_frames):
res = psnr(s_frame, rt_frame)
assert res > 30
atol = 2
for s_frame, rt_frame in zip(source_frames, round_trip_frames):
assert psnr(s_frame, rt_frame) > 30

assert_close(s_frame, rt_frame, atol=atol, rtol=0)

@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")
Copy link
Contributor

@NicolasHug NicolasHug Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's duplicated

Suggested change
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")

"Codec for webm is not available in the FFmpeg6/7 installation on Windows."
)
asset = TEST_SRC_2_720P
# Test that decode(encode(decode(asset))) == decode(asset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be at the top

f.write(source_frames.permute(0, 2, 3, 1).cpu().numpy().tobytes())

ffmpeg_encoded_path = str(tmp_path / f"ffmpeg_output.{format}")
# Test that lossless encoding is identical
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this comment is needed here, it looks out of place


const AVPixelFormat* getSupportedPixelFormats(const AVCodec& avCodec) {
const AVPixelFormat* supportedPixelFormats = nullptr;
#if LIBAVCODEC_VERSION_INT >= AV_VERSION_INT(61, 13, 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment to specify which major FFmpeg version this correspond to

Comment on lines +797 to +799
// av_packet_rescale_ts ensures encoded frames have correct timestamps.
// This prevents "no more frames" errors when decoding encoded frames,
// https://github.com/pytorch/audio/blob/b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e/src/libtorio/ffmpeg/stream_writer/encoder.cpp#L46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pytorch/audio/blob/b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e/src/libtorio/ffmpeg/stream_writer/encoder.cpp#L46 links to

    if (packet->duration == 0 && codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
      // 1 means that 1 frame (in codec time base, which is the frame rate)
      // This has to be set before av_packet_rescale_ts bellow.
      packet->duration = 1;
    }

which seems to be about the lines just above.

Is this comment at the right place? Maybe it should be a few lines above - and it should also explain why we need to set duration to 1 ?

UniqueEncodingAVFormatContext avFormatContext_;
UniqueAVCodecContext avCodecContext_;
int streamIndex_ = -1;
AVStream* avStream_;
Copy link
Contributor

@NicolasHug NicolasHug Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we now store avStream_ mostly because we need to access time_base? If that's the case, then let's get rid of the streamIndex_ field because it can now be accessed through avStream_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants