Decoder-native resize public implementation #1003

scotts · 2025-10-27T19:28:27Z

Public API for decoder-native resize. The implementation in this PR accepts both torchvision.transforms.v2.Resize and a newly defined torchcodec.transforms.Resize.

In #526, I had initially proposed not using TorchVision transforms, and instead coming up with TorchCodec specific versions. @NicolasHug proposed that we accept TorchVision transforms, and that's what I followed up with in my design in #885.

After discussing the previous iteration of this PR, we agreed we wanted to see what it would look like to accept both. Having implemented this, I agree it's the right thing to do:

We now don't need to require TorchVision, even when using the decoder-native feature.
We have a natural place to document the behavior of each decoder-native transform that we accept, and what its limitations are compared to the TorchVision version of that transform.
We have a more principled mechanism of enforcing how TorchVision transforms map to decoder-native semantics. We still have to dig into the TorchVision object to get the info we need, but the torchcodec.transforms class is a clear representation in code of what is supported. In the old PR, that mapping was buried in the logic that turned the TorchVision transform directly into the specification string the core API needs.

Four points worth discussing:

I made the base class for all TorchCodec defined decoder-native transforms to be DecoderTransform. I think it would be confusing if it was just Transform, and DecoderNativeTransform seems both too long and too obscure.
I made the module path torchcodec.transforms instead of torchcodec.decoder_transforms. That's almost counter to point 1, but I think that there's less chance of confusion with the module path.
Should it be DecoderResize instead of just Resize?
The type annotation that users will see only mentions accepting torchcodec.transforms.DecoderTransform. It does not mention the TorchVision transforms or nn.Module. The text of the docstring will say it, and I think that's enough?

src/torchcodec/_core/Transform.cpp

test/test_transform_ops.py

src/torchcodec/decoders/_video_decoder.py

scotts · 2025-11-07T04:15:53Z

mypy.ini

 show_error_codes = True
 pretty = True
 allow_redefinition = True
+follow_untyped_imports = True


I was getting linting errors like: https://github.com/meta-pytorch/torchcodec/actions/runs/19157614790/job/54761644331

Which points to docs which recommend the above change: https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports

NicolasHug · 2025-11-10T17:15:30Z

src/torchcodec/decoders/_video_decoder.py

            :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py`
+        transforms (sequence of transform objects, optional): Sequence of transforms to be
+            applied to the decoded frames by the decoder itself, in order. Accepts both
+            ``torchcodec.transforms.DecoderTransform`` and ``torchvision.transforms.v2.Transform``


For it to render as links in the docs:

Suggested change

``torchcodec.transforms.DecoderTransform`` and ``torchvision.transforms.v2.Transform``

:class:`~torchcodec.transforms.DecoderTransform` and :class:`~torchvision.transforms.v2.Transform`

We should also create a doc page for the transforms!

NicolasHug

Looks great, made a first pass!

NicolasHug · 2025-11-10T17:16:23Z

src/torchcodec/decoders/_video_decoder.py

+        transforms (sequence of transform objects, optional): Sequence of transforms to be
+            applied to the decoded frames by the decoder itself, in order. Accepts both
+            ``torchcodec.transforms.DecoderTransform`` and ``torchvision.transforms.v2.Transform``
+            objects. All transforms are applied in the ouput pixel format and colorspace.


Do we want to document this behavior? It seems binding, and we discussed that we may want to reserve the right to change the underlying implementation provided the output are still valid?

We want to reserve the right to change the underlying implementation, but we may not be able to easily change when we apply the transform with respect to the colorspace conversion. That fact is, I think, implied by what we consider to be our reference: a fully decoded frame passed to a TorchVision transform. In that scenario, the transform is always applied after the colorspace conversion.

Then I think the questions are:

Do we want to document that we consider passing untransformed frames to TorchVision transforms as our reference? I think we do, because I think that's implied by accepting the TorchVision transforms, and it's a easy way to explain the feature to users.

Is when the transform is applied useful to users? I thought it was, but if it's of little value, we could potentially just not talk about it.

Given how far away the tolerances were when TorchCodec applied the transform in YUV, but TorchVision applied them in RGB, I think that if we ever changed this behavior, it would have to be an option.

src/torchcodec/decoders/_video_decoder.py

src/torchcodec/transforms/_decoder_transforms.py

NicolasHug · 2025-11-10T17:21:28Z

src/torchcodec/transforms/_decoder_transforms.py

+    decoded frames and applying the same kind of transform.
+
+    Most DecoderTransforms have a complementary transform in TorchVision,
+    specificially in torchvision.transforms.v2. For such transforms, we ensure


Nit: add URL

NicolasHug · 2025-11-10T17:22:28Z

src/torchcodec/transforms/_decoder_transforms.py

+class Resize(DecoderTransform):
+    """Resize the decoded frame to a given size.
+
+    Complementary TorchVision transform: torchvision.transforms.v2.Resize.


Suggested change

Complementary TorchVision transform: torchvision.transforms.v2.Resize.

Complementary TorchVision transform: :class:`~torchvision.transforms.v2.Resize~.

src/torchcodec/transforms/_decoder_transforms.py

NicolasHug · 2025-11-10T17:26:52Z

src/torchcodec/decoders/_video_decoder.py

+                    " DecoderTransform. TorchCodec also accept TorchVision "
+                    "v2 transforms, but TorchVision is not installed."
+                )
+            if isinstance(transform, v2.Resize):


I think this fails if tv_available is False? Because v2 wouldn't exist

EDIT ah no that's probably fine because of the if not tv_available: check above.

Makes me thing we should have a dummy job where we don't install TV that ensures TC still works fine...

On a job which doesn't have TorchVision installed: I agree we need to do something here, but I'd like to punt on this for now. The current testing file imports TorchVision unconditionally. I think we'll want to separate out the tests that require TorchVision from those that don't so that we can test both behaviors, but that will require different .py files. I'd like to deal with that in its own PR.

I actually started to add a step in the current linux wheel test that did not install TorchVision when I realized this.

Yes, we can punt on this. I'm hoping we can do something very simple regarding testing: keep all but one test job using torchvision, and just have one small CI job that doesn't install TV and just runs a few tests, basically just insuring TV is an optional dependency. I'd like to avoid separating tests in different files just for that - we may have more than one optional dependency and that quickly becomes untractable.

test/test_transform_ops.py

scotts · 2025-11-12T21:03:11Z

src/torchcodec/decoders/_video_decoder.py

+            applied to the decoded frames by the decoder itself, in order. Accepts both
+            :class:`~torchcodec.transforms.DecoderTransform` and
+            `torchvision.transforms.v2.Transform <https://docs.pytorch.org/vision/stable/transforms.html#v2-api-reference-recommended>`_
+            objects. All transforms are applied


This and all other references to TorchVision transforms use hard links. I don't think we can get proper Sphinx references when it's in a different project.

We can, we'll just need to add a torchvision entry here:

torchcodec/docs/source/conf.py

Lines 209 to 215 in b35005d

intersphinx_mapping = {

"python": ("https://docs.python.org/3/", None),

"torch": ("https://pytorch.org/docs/stable/", None),

"numpy": ("https://numpy.org/doc/stable/", None),

"PIL": ("https://pillow.readthedocs.io/en/stable/", None),

"matplotlib": ("https://matplotlib.org/stable/", None),

}

Feel free to leave that as follow-up / open an issue.

scotts · 2025-11-13T02:26:57Z

src/torchcodec/transforms/_decoder_transforms.py

+    def _make_params(self) -> str:
+        assert len(self.size) == 2
+        return f"resize, {self.size[0]}, {self.size[1]}"
+


Note this class method below is new. Because I'm trying to exhaustively catch all of the v2.Resize options we don't support, the code for turning a v2.Resize into a torchcodec.transforms.Resize got more involved. Extrapolated across more transforms, this kind of logic would end up dominating the code in _video_decoder.py. By make this a private class method, we can put all logic related to what in v2.Resize we support and how to turn a v2.Resize into a torchcodec.transforms.Resize in one place.

Also, to state it explicit, _from_torchvision() and _make_params() are private methods so they're not publicly documented. Users shouldn't need to know about them.

NicolasHug · 2025-11-13T10:11:57Z

docs/source/api_ref_transforms.rst

@@ -0,0 +1,17 @@
+.. _samplers:


Suggested change

.. _samplers:

.. _transforms:

NicolasHug · 2025-11-13T10:12:26Z

docs/source/api_ref_transforms.rst

+===================
+torchcodec.transforms
+===================


We're getting warnings when the length don't match

Suggested change

===================

torchcodec.transforms

===================

=====================

torchcodec.transforms

=====================

NicolasHug · 2025-11-13T10:14:12Z

src/torchcodec/transforms/_decoder_transforms.py

+    should be both faster and more memory efficient than receiving normally
+    decoded frames and applying the same kind of transform.
+
+    Most `DecoderTransform` objects have a complementary transform in TorchVision,


Annoyingly, single backticks in rst means italics. I think you wanted those to be code, like in markdown? For that we need double backticks (there are other instances of single backticks below and maybe in other files?)

Suggested change

Most `DecoderTransform` objects have a complementary transform in TorchVision,

Most ``DecoderTransform`` objects have a complementary transform in TorchVision,

I saw them render as italics, and I just though, "Oh, Sphinx makes code just italics? Okay..." :)

NicolasHug · 2025-11-13T10:17:47Z

src/torchcodec/decoders/_video_decoder.py

+                    " DecoderTransform. TorchCodec also accept TorchVision "
+                    "v2 transforms, but TorchVision is not installed."
+                )
+            if isinstance(transform, v2.Resize):


Yes, we can punt on this. I'm hoping we can do something very simple regarding testing: keep all but one test job using torchvision, and just have one small CI job that doesn't install TV and just runs a few tests, basically just insuring TV is an optional dependency. I'd like to avoid separating tests in different files just for that - we may have more than one optional dependency and that quickly becomes untractable.

NicolasHug · 2025-11-13T10:18:51Z

src/torchcodec/decoders/_video_decoder.py

+                    " DecoderTransform. TorchCodec also accept TorchVision "
+                    "v2 transforms, but TorchVision is not installed."
+                )
+            if isinstance(transform, v2.Resize):


Nit, I think I would have been less surprised by v2 being actually optional if this were elif.

Suggested change

if isinstance(transform, v2.Resize):

elif isinstance(transform, v2.Resize):

NicolasHug · 2025-11-13T10:20:22Z

src/torchcodec/decoders/_video_decoder.py

+            applied to the decoded frames by the decoder itself, in order. Accepts both
+            :class:`~torchcodec.transforms.DecoderTransform` and
+            `torchvision.transforms.v2.Transform <https://docs.pytorch.org/vision/stable/transforms.html#v2-api-reference-recommended>`_
+            objects. All transforms are applied


We can, we'll just need to add a torchvision entry here:

torchcodec/docs/source/conf.py

Lines 209 to 215 in b35005d

intersphinx_mapping = {

"python": ("https://docs.python.org/3/", None),

"torch": ("https://pytorch.org/docs/stable/", None),

"numpy": ("https://numpy.org/doc/stable/", None),

"PIL": ("https://pillow.readthedocs.io/en/stable/", None),

"matplotlib": ("https://matplotlib.org/stable/", None),

}

Feel free to leave that as follow-up / open an issue.

NicolasHug · 2025-11-13T10:31:04Z

src/torchcodec/transforms/_decoder_transforms.py

+
+    @classmethod
+    def _from_torchvision(cls, resize_tv: nn.Module):
+        from torchvision.transforms import v2


I'd suggest the following:

try: from torchvision.transforms import v2 except ImportError from e: raise RuntimeError("Couldn't find TorchVision - this should never happen, please report a bug") from e

This should probably be in a helper function, reused across classes. My goal here is mainly to help the reader (us, the devs) understand that this code-path is only expected to be run when TV is already available. Otherwise, the plain import makes it look like v2 could be a hard dep.

At the moment, there is only one place where it's a bug if we can't find TorchVision, so I'll keep this as not-a-function for now. The other place we import TorchVision dynamically, we are fine if it's not there.

Ohhh, right, but we're going to have one of these for each transform. I'll pull into a function now, and it should just live in this file.

NicolasHug · 2025-11-13T10:32:55Z

src/torchcodec/transforms/_decoder_transforms.py

+    def _make_params(self) -> str:
+        assert len(self.size) == 2
+        return f"resize, {self.size[0]}, {self.size[1]}"


Can we call it something else than _make_params?

make_params exists for the v2 transforms, but it does something quite different.

NicolasHug · 2025-11-13T10:50:40Z

src/torchcodec/decoders/_video_decoder.py

+            :class:`~torchcodec.transforms.DecoderTransform` and
+            `torchvision.transforms.v2.Transform <https://docs.pytorch.org/vision/stable/transforms.html#v2-api-reference-recommended>`_
+            objects. All transforms are applied
+            in the ouput pixel format and colorspace. Read more about this parameter in:


Following up on #1003 (comment)

Do we want to document that we consider passing untransformed frames to TorchVision transforms as our reference? I think we do, because I think that's implied by accepting the TorchVision transforms, and it's a easy way to explain the feature to users.

Agreed, we should document and claim that TV is our ref. I think we have slightly different understandings of what we mean by "TV is our ref", your definition is slightly stricter than mine (see below).

Is when the transform is applied useful to users? I thought it was, but if it's of little value, we could potentially just not talk about it.

I don't think it adds a lot of value to document, as I don't know if that's a questions users are even asking themselves. But I could be wrong and I don't feel strongly about it. What I'm slightly more concerned about the comment is that it seems like a contract, and I suspect we may want to relax that behavior in the future. E.g. for crop, we might want to apply it in YUV space instead of RGB if it's faster and if models can't notice the difference.

To me, when we say "TV is our ref", it means "this transforms has the same behavior as the TV transform as far as models are concerned". It's not strictly about bitwise equality (we'll never have that). It's only about whether the models can tell the difference. We know they can tell the difference for resize's interpolation mode. But if they can't tell the difference for (e.g.) crop being applied before or after color-conversion, I think we could allow ourselves to make that change of behavior. That allows us more freedom to potentially enable higher perf gains in the future.

None of my comments above are blocking. We can go ahead as-is. I'm happy that for once, I am not the one insisting on strictness :D

@NicolasHug, that's all fair, and I also think it's fair to err on the side of explaining less about the implementation. If folks start asking about it, we can revisit.

But if they can't tell the difference for (e.g.) crop being applied before or after color-conversion, I think we could allow ourselves to make that change of behavior. That allows us more freedom to potentially enable higher perf gains in the future.

Based on what I did with crop and resize, I actually think that is likely to be the case everywhere: applying the transform in YUV versus RGB will be noticeable by the model. But we can easily punt on that determination by just not saying anything about it. If it becomes something folks ask about, we may need to make it an explicit option, in which case we'll document behavior.

Decoder-native resize public implementation

dd24dfa

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 27, 2025

scotts commented Oct 27, 2025

View reviewed changes

src/torchcodec/_core/Transform.cpp Show resolved Hide resolved

scotts commented Oct 27, 2025

View reviewed changes

test/test_transform_ops.py Outdated Show resolved Hide resolved

Lint

3a2df84

scotts commented Oct 27, 2025

View reviewed changes

src/torchcodec/decoders/_video_decoder.py Outdated Show resolved Hide resolved

scotts commented Oct 27, 2025

View reviewed changes

src/torchcodec/decoders/_video_decoder.py Show resolved Hide resolved

scotts mentioned this pull request Oct 30, 2025

Proper resize tests; remove swscale resize #1013

Merged

scotts added 11 commits November 6, 2025 06:24

Merge branch 'main' of github.com:pytorch/torchcodec into transform_api

5344ab4

Implement decoder native transforms API

98cf81b

Correct merge

65c4ad7

Actually add new file

f300c70

Lint

2c3b7f0

Better assert

80e84b5

Better comment

5ac60d8

Top level transforms import

531b40f

Add the init file. Sigh.

cc333ac

Linter now needs torchvision in the environment

238a8ff

Avoid missing import errors

55d362c

scotts commented Nov 7, 2025

View reviewed changes

scotts added 2 commits November 7, 2025 18:40

Better names, better docs

0d2492e

More testing, docstring editing

a2da767

scotts marked this pull request as ready for review November 10, 2025 02:11

NicolasHug reviewed Nov 10, 2025

View reviewed changes

Dan-Flores reviewed Nov 10, 2025

View reviewed changes

test/test_transform_ops.py Show resolved Hide resolved

scotts added 4 commits November 11, 2025 06:51

Changes

2cd3f65

Reference docs

4ff0186

Better docs

0f9eb62

Make make params private

8081298

Links to TorchVision.

39ed9ac

scotts commented Nov 12, 2025

View reviewed changes

scotts added 6 commits November 12, 2025 13:04

Rename conversion function

6e6815c

Add no-torchvision job

363e688

On second thought, let's not

463674d

Lists are not covariant?

c20914c

Just use an explicit type

254641a

Pull tv2 inspection logic into decoder transform

9b4186a

scotts commented Nov 13, 2025

View reviewed changes

Update conversion arg comment

105c77f

NicolasHug approved these changes Nov 13, 2025

View reviewed changes

Better importing, better docs

70b5976

scotts merged commit 0535b00 into meta-pytorch:main Nov 14, 2025
63 of 70 checks passed

scotts deleted the transform_api branch November 14, 2025 02:44

	``torchcodec.transforms.DecoderTransform`` and ``torchvision.transforms.v2.Transform``
	:class:`~torchcodec.transforms.DecoderTransform` and :class:`~torchvision.transforms.v2.Transform`

	Complementary TorchVision transform: torchvision.transforms.v2.Resize.
	Complementary TorchVision transform: :class:`~torchvision.transforms.v2.Resize~.

	intersphinx_mapping = {
	"python": ("https://docs.python.org/3/", None),
	"torch": ("https://pytorch.org/docs/stable/", None),
	"numpy": ("https://numpy.org/doc/stable/", None),
	"PIL": ("https://pillow.readthedocs.io/en/stable/", None),
	"matplotlib": ("https://matplotlib.org/stable/", None),
	}

	Most `DecoderTransform` objects have a complementary transform in TorchVision,
	Most ``DecoderTransform`` objects have a complementary transform in TorchVision,

	if isinstance(transform, v2.Resize):
	elif isinstance(transform, v2.Resize):

Decoder-native resize public implementation #1003

Decoder-native resize public implementation #1003

Uh oh!

Conversation

scotts commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

scotts commented Oct 27, 2025 •

edited

Loading

scotts Nov 10, 2025 •

edited

Loading

NicolasHug Nov 10, 2025 •

edited

Loading