@@ -58,49 +58,12 @@ struct AudioFramesOutput {
5858// FRAME TENSOR ALLOCATION APIs
5959// --------------------------------------------------------------------------
6060
61- // Note [Frame Tensor allocation and height and width ]
61+ // Note [Frame Tensor allocation]
6262//
6363// We always allocate [N]HWC tensors. The low-level decoding functions all
6464// assume HWC tensors, since this is what FFmpeg natively handles. It's up to
6565// the high-level decoding entry-points to permute that back to CHW, by calling
6666// maybePermuteHWC2CHW().
67- //
68- // TODO: Rationalize the comment below with refactoring.
69- //
70- // Also, importantly, the way we figure out the the height and width of the
71- // output frame tensor varies, and depends on the decoding entry-point. In
72- // *decreasing order of accuracy*, we use the following sources for determining
73- // height and width:
74- // - getHeightAndWidthFromResizedAVFrame(). This is the height and width of the
75- // AVframe, *post*-resizing. This is only used for single-frame decoding APIs,
76- // on CPU, with filtergraph.
77- // - getHeightAndWidthFromOptionsOrAVFrame(). This is the height and width from
78- // the user-specified options if they exist, or the height and width of the
79- // AVFrame *before* it is resized. In theory, i.e. if there are no bugs within
80- // our code or within FFmpeg code, this should be exactly the same as
81- // getHeightAndWidthFromResizedAVFrame(). This is used by single-frame
82- // decoding APIs, on CPU with swscale, and on GPU.
83- // - getHeightAndWidthFromOptionsOrMetadata(). This is the height and width from
84- // the user-specified options if they exist, or the height and width form the
85- // stream metadata, which itself got its value from the CodecContext, when the
86- // stream was added. This is used by batch decoding APIs, for both GPU and
87- // CPU.
88- //
89- // The source of truth for height and width really is the (resized) AVFrame: it
90- // comes from the decoded ouptut of FFmpeg. The info from the metadata (i.e.
91- // from the CodecContext) may not be as accurate. However, the AVFrame is only
92- // available late in the call stack, when the frame is decoded, while the
93- // CodecContext is available early when a stream is added. This is why we use
94- // the CodecContext for pre-allocating batched output tensors (we could
95- // pre-allocate those only once we decode the first frame to get the info frame
96- // the AVFrame, but that's a more complex logic).
97- //
98- // Because the sources for height and width may disagree, we may end up with
99- // conflicts: e.g. if we pre-allocate a batch output tensor based on the
100- // metadata info, but the decoded AVFrame has a different height and width.
101- // it is very important to check the height and width assumptions where the
102- // tensors memory is used/filled in order to avoid segfaults.
103-
10467torch::Tensor allocateEmptyHWCTensor (
10568 const FrameDims& frameDims,
10669 const torch::Device& device,
0 commit comments