-
Notifications
You must be signed in to change notification settings - Fork 64
Update Video Encoder and tests for 6 container formats #913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f8d03ad
to
f414d0b
Compare
f414d0b
to
c29dee3
Compare
avCodecContext_->global_quality = FF_QP2LAMBDA * qp; | ||
} | ||
int status = avcodec_open2(avCodecContext_.get(), avCodec, &options); | ||
av_dict_free(&options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above, the crf
parameter is reused to set qscale
to encode high quality videos in round-trip tests. But, the C++ function only allows crf
to be set, not qscale
. Since qscale
is not needed anywhere else, I did not think it was worth including, but I am open to feedback here.
assert ( | ||
source_frames.shape == round_trip_frames.shape | ||
), f"Shape mismatch: source {source_frames.shape} vs round_trip {round_trip_frames.shape}" | ||
assert ( | ||
source_frames.dtype == round_trip_frames.dtype | ||
), f"Dtype mismatch: source {source_frames.dtype} vs round_trip {round_trip_frames.dtype}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use plain assert like
assert source_frames.shape == round_trip_frames.shape
pytest will provide a proper error message
atol = 2 | ||
# Check that PSNR for decode(encode(samples)) is above 30 | ||
for s_frame, rt_frame in zip(source_frames, round_trip_frames): | ||
res = psnr(s_frame, rt_frame) | ||
assert res > 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize it's not from this PR but let's clean that up a little:
- the comment isn't needed as the code is really self explanatory (and it doesn't just do psnr validation anymore!)
- no need to store
res
atol = 2 | |
# Check that PSNR for decode(encode(samples)) is above 30 | |
for s_frame, rt_frame in zip(source_frames, round_trip_frames): | |
res = psnr(s_frame, rt_frame) | |
assert res > 30 | |
atol = 2 | |
for s_frame, rt_frame in zip(source_frames, round_trip_frames): | |
assert psnr(s_frame, rt_frame) > 30 |
assert_close(s_frame, rt_frame, atol=atol, rtol=0) | ||
|
||
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available") | ||
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's duplicated
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available") |
"Codec for webm is not available in the FFmpeg6/7 installation on Windows." | ||
) | ||
asset = TEST_SRC_2_720P | ||
# Test that decode(encode(decode(asset))) == decode(asset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be at the top
f.write(source_frames.permute(0, 2, 3, 1).cpu().numpy().tobytes()) | ||
|
||
ffmpeg_encoded_path = str(tmp_path / f"ffmpeg_output.{format}") | ||
# Test that lossless encoding is identical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this comment is needed here, it looks out of place
|
||
const AVPixelFormat* getSupportedPixelFormats(const AVCodec& avCodec) { | ||
const AVPixelFormat* supportedPixelFormats = nullptr; | ||
#if LIBAVCODEC_VERSION_INT >= AV_VERSION_INT(61, 13, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment to specify which major FFmpeg version this correspond to
// av_packet_rescale_ts ensures encoded frames have correct timestamps. | ||
// This prevents "no more frames" errors when decoding encoded frames, | ||
// https://github.com/pytorch/audio/blob/b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e/src/libtorio/ffmpeg/stream_writer/encoder.cpp#L46 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (packet->duration == 0 && codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
// 1 means that 1 frame (in codec time base, which is the frame rate)
// This has to be set before av_packet_rescale_ts bellow.
packet->duration = 1;
}
which seems to be about the lines just above.
Is this comment at the right place? Maybe it should be a few lines above - and it should also explain why we need to set duration to 1 ?
UniqueEncodingAVFormatContext avFormatContext_; | ||
UniqueAVCodecContext avCodecContext_; | ||
int streamIndex_ = -1; | ||
AVStream* avStream_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we now store avStream_ mostly because we need to access time_base
? If that's the case, then let's get rid of the streamIndex_
field because it can now be accessed through avStream_
This PR updates the VideoEncoder to have round-trip encoding/decoding capabilities for 6 common video container formats:
mov
,mp4
,avi
,mkv
,webm
,flv
.gif
is also tested against the CLI, but not round trip due to only having 8-bits per pixel.Changes:
Some changes are made to align with the design in #907:
Testing
test_video_encoder_round_trip
: Ensures that a video's decoded frames are the same after encoding then decoding.test_video_encoder_against_ffmpeg_cli
: Ensures that the VideoEncoder frames are the same as the FFmpeg CLI.Testing caveats
qscale
is substituted in forcrf
when older codecs are used.crf
parameter is not ideal, butcrf
andscale
are (only) needed for testing lossless encoding, not as an option from the Python API. I'm open to other ideas or parameters to test correctness.assert_tensor_close_on_at_least
with a percentage (97-99), and a higheratol
is used (2-15).