-
Notifications
You must be signed in to change notification settings - Fork 74
Validation of allocated output tensor shapes #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… convertFrameToTensorUsingFilterGraph
| expectedOutputWidth, | ||
| "x3, got ", | ||
| shape); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check above validates the shape of the pre-allocated output tensor. It was previously only done for swscale, but it should also be done for filtergraph, hence why I moved it up.
| int VideoDecoder::convertFrameToBufferUsingSwsScale( | ||
| int streamIndex, | ||
| const AVFrame* frame, | ||
| torch::Tensor& outputTensor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the signature of convertFrameToBufferUsingSwsScale for it to be very similar to convertFrameToTensorUsingFilterGraph. The signatures are now:
int VideoDecoder::convertFrameToBufferUsingSwsScale(
int streamIndex,
const AVFrame* frame,
torch::Tensor& outputTensor);
torch::Tensor VideoDecoder::convertFrameToTensorUsingFilterGraph(
int streamIndex,
const AVFrame* frame);
Their signatures being so different previously was a constant source of confusion for me.
I realize that this change seemingly goes against our TODO of folding the pre-allocated tensor within RawOutput. But I think this TODO will in fact be easier to address with that change, as both APIs will be more closely aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function now returning the height?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we can check against it and have the swscale check next to the filtergraph check. It makes the logic of both libraries more symmetric.
| TORCH_CHECK_EQ(filteredFrame->format, AV_PIX_FMT_RGB24); | ||
| auto frameDims = getHeightAndWidthFromOptionsOrAVFrame( | ||
| streams_[streamIndex].options, *filteredFrame.get()); | ||
| auto frameDims = getHeightAndWidthFromResizedAVFrame(*filteredFrame.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reverted this logic to what it previously was before #332. I had some doubt in #332 (comment) already, and now I think it's clear that this logic for finding height and width should be treated as a separate logic from the 2 existing ones.
I have updated the big note below, with my updated understanding.
(I know it's a bit complicated. But the complexity isn't caused by this PR!)
|
|
||
| // TODO height and width info of output tensor comes from the metadata, which | ||
| // may not be accurate. How do we make sure we won't corrupt memory if the | ||
| // allocated tensor is too short/large? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahmadsharif1 any idea on how to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can look at rawOutput.frame's dimensions and use those, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to get height and width from the AVFrame instead of from the Metadata then? I don't mind but that's a change of logic. Do you know why it wasn't done like that before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, I'll submit a follow-up to do that.
| TORCH_CHECK( | ||
| (shape.size() == 3) && (shape[0] == expectedOutputHeight) && | ||
| (shape[1] == expectedOutputWidth) && (shape[2] == 3), | ||
| "Expected pre-allocated tensor of shape ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also be able to do (shape == {expectedOutputHeight, expectedOutputWidth, 3}) if you think that's clearer. I'm not sure.
| } | ||
| } | ||
|
|
||
| FrameDims getHeightAndWidthFromResizedAVFrame(const AVFrame& resizedAVFrame) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the function itself has no opinion on what kind of frame it is (resized or not), I think the name should just be getHeightAndWidthFromAVFrame().
Read comment below. I get the motivation for grepping through the code. I don't think this is amazing, but it should help us get to a cleaner place, and it does make explicit how convoluted things are.
|
The changes look good to me, but I think @ahmadsharif1 should also review. |
| int VideoDecoder::convertFrameToBufferUsingSwsScale( | ||
| int streamIndex, | ||
| const AVFrame* frame, | ||
| torch::Tensor& outputTensor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function now returning the height?
| // for pre-allocating batched output tensors (we could pre-allocate those only | ||
| // once we decode the first frame to get the info frame the AVFrame, but that's | ||
| // a more complex logic). | ||
| // The source of truth for height and width really is the (resized) AVFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decoded output from ffmpeg isn't resized. The resize happens later
|
|
||
| // There's nothing preventing you from calling this on a non-resized frame, but | ||
| // please don't. | ||
| FrameDims getHeightAndWidthFromResizedAVFrame(const AVFrame& resizedAVFrame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getHeightAndWidth can just be called getFrameDims
|
|
||
| // TODO height and width info of output tensor comes from the metadata, which | ||
| // may not be accurate. How do we make sure we won't corrupt memory if the | ||
| // allocated tensor is too short/large? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can look at rawOutput.frame's dimensions and use those, right?
This PR adds checks for the shape of the output tensors we allocate. This should help us avoid some potential memory corruption issues.
This PR does not change any existing logic (well, not really, see details in comments below). It only adds safer checks against existing assumptions.