Skip to content

Commit f8cd5ec

Browse files
committed
Minor edits
1 parent 4aae78a commit f8cd5ec

File tree

1 file changed

+13
-14
lines changed

1 file changed

+13
-14
lines changed

src/torchcodec/decoders/_core/VideoDecoder.cpp

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -892,37 +892,36 @@ VideoDecoder::FrameBatchOutput VideoDecoder::getFramesPlayedInRange(
892892
// Like for video, FFmpeg exposes the concept of a frame for audio streams. An
893893
// audio frame is a contiguous sequence of samples, where a sample consists of
894894
// `numChannels` values. An audio frame, or a sequence thereof, is always
895-
// converted into a tensor of shape `(numChannels, numSamplesPerChannel)`
896-
// tensors.
895+
// converted into a tensor of shape `(numChannels, numSamplesPerChannel)`.
897896
//
898897
// The notion of 'frame' in audio isn't what users want to interact with. Users
899898
// want to interact with samples. The C++ and core APIs return frames, because
900899
// we want those to be close to FFmpeg concepts, but the higher-level public
901900
// APIs expose samples. As a result:
902-
// - We don't expose index-based APIs for audio, because exposing index-based
903-
// APIs explicitly exposes the concept of audio frame. For know, we think
904-
// exposing time-based APIs is more natural.
901+
// - We don't expose index-based APIs for audio, because that would mean
902+
// exposing the concept of audio frame. For now, we think exposing time-based
903+
// APIs is more natural.
905904
// - We never perform a scan for audio streams. We don't need to, since we won't
906-
// be converting timestamps to indices. That's why we enforce the "seek_mode"
905+
// be converting timestamps to indices. That's why we enforce the seek_mode
907906
// to be "approximate" (which is slightly misleading, because technically the
908-
// output frames / samples will be at their exact positions. But this
909-
// incongruence is only exposed at the C++/core private levels).
907+
// output samples will be at their exact positions. But this incongruence is
908+
// only exposed at the C++/core private levels).
910909
//
911910
// Audio frames are of variable dimensions: in the same stream, a frame can
912911
// contain 1024 samples and the next one may contain 512 [1]. This makes it
913912
// impossible to stack audio frames in the same way we can stack video frames.
914-
// That's why audio frames are *concatenated* along the samples dimension, not
915-
// stacked. This is also why we cannot re-use the same pre-allocation logic we
916-
// have for videos in getFramesPlayedInRange(): this would require constant (and
917-
// known) frame dimensions.
913+
// This is one of the main reasons we cannot reuse the same pre-allocation logic
914+
// we have for videos in getFramesPlayedInRange(): pre-allocating a batch
915+
// requires constant (and known) frame dimensions. That's also why
916+
// *concatenated* along the samples dimension, not stacked.
918917
//
919918
// [IMPORTANT!] There is one key invariant that we must respect when decoding
920919
// audio frames:
921920
//
922921
// BEFORE DECODING FRAME i, WE MUST DECODE ALL FRAMES j < i.
923922
//
924923
// Always. Why? We don't know. What we know is that if we don't, we get clipped,
925-
// incorrect audio as output [1]. All other (correct) libraries like TorchAudio
924+
// incorrect audio as output [2]. All other (correct) libraries like TorchAudio
926925
// or Decord do something similar, whether it was intended or not. This has a
927926
// few implications:
928927
// - The **only** place we're allowed to seek to in an audio stream is the
@@ -935,7 +934,7 @@ VideoDecoder::FrameBatchOutput VideoDecoder::getFramesPlayedInRange(
935934
// need is in the future, we don't seek back to the beginning, we just decode
936935
// all the frames in-between.
937936
//
938-
// [1] If you're brave and curious, you can read the long "Seek offset for
937+
// [2] If you're brave and curious, you can read the long "Seek offset for
939938
// audio" note in https://github.com/pytorch/torchcodec/pull/507/files, which
940939
// sums up past (and failed) attemps at working around this issue.
941940
VideoDecoder::AudioFramesOutput VideoDecoder::getFramesPlayedInRangeAudio(

0 commit comments

Comments
 (0)