Transforms bridge between Python and C++ #948
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Next step after #902. The design in #885 punted on how the Python layer would communicate the transforms and their parameters to the C++ layer. This PR answers that question: a string. The string format is:
In the above,
nameX
is the name of a transform, andparamX
are the parameters that transform accepts. For example, the only transform that we have now is resize, and its spec is currently:Where
resize
is literally what we expect, and<height>
and<width>
are integers that will become the height and width. In the future we will add a third parameter for algorithm. Future transforms will take potentially different number of parameters with different types; we'll define exactly what the spec for each transform is when we add it.I don't love that we're using strings with our own little specification language, but I'm convinced this is the least bad option:
0 -> resize
, and then if we wanted to specify a resize operation of height 1024 and width 768, we could saytorch.tensor([0, 1024, 768])
. But both the Python and C++ side would need to know this mapping of integer to transform. Yes, that's technically true with strings, but it's rather obvious what"resize"
means. The machinery required for this approach is even more than what's required to accept our little string spec language.VideoDecoder
class will be responsible for translating fromtorchvision.transforms.v2
to these specification strings. Since it's our own code that will generate these specs, we don't need to worry about making something with sharp edges that will cut users.