-
Notifications
You must be signed in to change notification settings - Fork 71
Implement RandomCrop transform #1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| int x = checkedToPositiveInt(cropTransformSpec[3]); | ||
| int y = checkedToPositiveInt(cropTransformSpec[4]); | ||
| int x = checkedToNonNegativeInt(cropTransformSpec[3]); | ||
| int y = checkedToNonNegativeInt(cropTransformSpec[4]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The location (0, 0) is a valid image location. 🤦
| if self._top is None or self._left is None: | ||
| # TODO: It would be very strange if only ONE of those is None. But should we | ||
| # make it an error? We can continue, but it would probably mean | ||
| # something bad happened. Dear reviewer, please register an opinion here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would appear something bad happened in this case.
But when calling this function, do we expect _top or _left to have any value? My understanding is that these fields are only set when _make_transform_spec is called, which is only called once per DecoderTransform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on if RandomCrop was created via a TorchVision RandomCrop or instantiated directly. If created by a TorchVision RandomCrop in _from_torchvision(), then _top and _left should have values. If created directly, then both should not have values, in which case we have to do our random logic.
It has occurred to me that maybe we don't need to call RandomCrop.make_params() in _from_torchvision(). Maybe we should just always set these values in _make_transform_spec().
| "v2 transform." | ||
| ) | ||
| else: | ||
| input_dims = transform._get_output_dims(input_dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is _get_output_dims is only used in this function for validation? I believe there are TODOs to move validation to the constructor which is great, but I do not understand the the returned value input_dims here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not actually used for validation. Think of the transforms as a pipeline: A -> B -> C -> D. Each stage may change the dimensions of the frame. We need to track the frame dimensions as we move through the pipeline because some transforms need to know the dimensions of the input frame. RandomCrop is one such transform: in order to randomly determine a location to crop, it needs to know the input frame dimensions to know the bounds to pass to the random number generator.
The dimensions that A receives come from the originally decoded frame, which we can get from the metadata. But the dimensions for B are actually the output of A! That extends to each transform in the pipeline.
This probably deserves a comment. :)
Implements
torchcodec.transforms.RandomCropand also acceptstorchvision.transforms.v2.RandomCrop. The key difference between this capability andResizeis that we need to:Short version of how we accomplish this:
make_params()on it to get the computed location.Working on this transform also made me realize that
DecoderTransformand its subclasses are not dataclasses. I initially thought they would just be bags of values. But they're growing to have significant methods and internal state not exposed to users. In a follow-up PR, I'll refactor these into normal classes, much like the TorchVision versions. I felt that was too disruptive to do in this PR.