@@ -900,12 +900,13 @@ VideoDecoder::FrameBatchOutput VideoDecoder::getFramesPlayedInRange(
900900// we want those to be close to FFmpeg concepts, but the higher-level public
901901// APIs expose samples. As a result:
902902// - We don't expose index-based APIs for audio, because exposing index-based
903- // APIs expliciltly exposes the concept of audio frame. For know, we think
903+ // APIs explicitly exposes the concept of audio frame. For know, we think
904904// exposing time-based APIs is more natural.
905905// - We never perform a scan for audio streams. We don't need to, since we won't
906- // be converting timestamps to indices. That's why we enfore the "seek_mode"
907- // to be "approximate" (which is slightly mis-leading, because technically the
908- // output frames / samples will be perfectly exact).
906+ // be converting timestamps to indices. That's why we enforce the "seek_mode"
907+ // to be "approximate" (which is slightly misleading, because technically the
908+ // output frames / samples will be at their exact positions. But this
909+ // incongruence is only exposed at the C++/core private levels).
909910//
910911// Audio frames are of variable dimensions: in the same stream, a frame can
911912// contain 1024 samples and the next one may contain 512 [1]. This makes it
@@ -918,7 +919,7 @@ VideoDecoder::FrameBatchOutput VideoDecoder::getFramesPlayedInRange(
918919// [IMPORTANT!] There is one key invariant that we must respect when decoding
919920// audio frames:
920921//
921- // BEFORE DECODING FRAME I , WE MUST DECODE ALL FRAMES j < i.
922+ // BEFORE DECODING FRAME i , WE MUST DECODE ALL FRAMES j < i.
922923//
923924// Always. Why? We don't know. What we know is that if we don't, we get clipped,
924925// incorrect audio as output [1]. All other (correct) libraries like TorchAudio
0 commit comments