Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
dd24dfa
Decoder-native resize public implementation
scotts Oct 27, 2025
3a2df84
Lint
scotts Oct 27, 2025
5344ab4
Merge branch 'main' of github.com:pytorch/torchcodec into transform_api
scotts Nov 6, 2025
98cf81b
Implement decoder native transforms API
scotts Nov 7, 2025
65c4ad7
Correct merge
scotts Nov 7, 2025
f300c70
Actually add new file
scotts Nov 7, 2025
2c3b7f0
Lint
scotts Nov 7, 2025
80e84b5
Better assert
scotts Nov 7, 2025
5ac60d8
Better comment
scotts Nov 7, 2025
531b40f
Top level transforms import
scotts Nov 7, 2025
cc333ac
Add the init file. Sigh.
scotts Nov 7, 2025
238a8ff
Linter now needs torchvision in the environment
scotts Nov 7, 2025
55d362c
Avoid missing import errors
scotts Nov 7, 2025
0d2492e
Better names, better docs
scotts Nov 8, 2025
a2da767
More testing, docstring editing
scotts Nov 10, 2025
2cd3f65
Changes
scotts Nov 11, 2025
4ff0186
Reference docs
scotts Nov 12, 2025
0f9eb62
Better docs
scotts Nov 12, 2025
8081298
Make make params private
scotts Nov 12, 2025
39ed9ac
Links to TorchVision.
scotts Nov 12, 2025
6e6815c
Rename conversion function
scotts Nov 12, 2025
363e688
Add no-torchvision job
scotts Nov 12, 2025
463674d
On second thought, let's not
scotts Nov 12, 2025
c20914c
Lists are not covariant?
scotts Nov 12, 2025
254641a
Just use an explicit type
scotts Nov 12, 2025
9b4186a
Pull tv2 inspection logic into decoder transform
scotts Nov 13, 2025
105c77f
Update conversion arg comment
scotts Nov 13, 2025
70b5976
Better importing, better docs
scotts Nov 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
run: python -m pip install --upgrade pip
- name: Install dependencies and FFmpeg
run: |
python -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
python -m pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
conda install "ffmpeg=7.0.1" pkg-config pybind11 -c conda-forge
ffmpeg -version
- name: Build and install torchcodec
Expand Down
1 change: 1 addition & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ files = src/torchcodec
show_error_codes = True
pretty = True
allow_redefinition = True
follow_untyped_imports = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 changes: 1 addition & 1 deletion src/torchcodec/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# Note: usort wants to put Frame and FrameBatch after decoders and samplers,
# but that results in circular import.
from ._frame import AudioSamples, Frame, FrameBatch # usort:skip # noqa
from . import decoders, encoders, samplers # noqa
from . import decoders, encoders, samplers, transforms # noqa

try:
# Note that version.py is generated during install.
Expand Down
88 changes: 87 additions & 1 deletion src/torchcodec/decoders/_video_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import json
import numbers
from pathlib import Path
from typing import Literal, Optional, Tuple, Union
from typing import List, Literal, Optional, Sequence, Tuple, Union

import torch
from torch import device as torch_device, Tensor
Expand All @@ -19,6 +19,7 @@
create_decoder,
ERROR_REPORTING_INSTRUCTIONS,
)
from torchcodec.transforms import DecoderTransform, Resize


class VideoDecoder:
Expand Down Expand Up @@ -66,6 +67,10 @@ class VideoDecoder:
probably is. Default: "exact".
Read more about this parameter in:
:ref:`sphx_glr_generated_examples_decoding_approximate_mode.py`
transforms (sequence of transform objects, optional): Sequence of transforms to be
applied to the decoded frames by the decoder itself, in order. Accepts both
``torchcodec.transforms.DecoderTransform`` and ``torchvision.transforms.v2.Transform``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For it to render as links in the docs:

Suggested change
``torchcodec.transforms.DecoderTransform`` and ``torchvision.transforms.v2.Transform``
:class:`~torchcodec.transforms.DecoderTransform` and :class:`~torchvision.transforms.v2.Transform`

We should also create a doc page for the transforms!

objects. All transforms are applied in the ouput pixel format and colorspace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to document this behavior? It seems binding, and we discussed that we may want to reserve the right to change the underlying implementation provided the output are still valid?

Copy link
Contributor Author

@scotts scotts Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to reserve the right to change the underlying implementation, but we may not be able to easily change when we apply the transform with respect to the colorspace conversion. That fact is, I think, implied by what we consider to be our reference: a fully decoded frame passed to a TorchVision transform. In that scenario, the transform is always applied after the colorspace conversion.

Then I think the questions are:

  1. Do we want to document that we consider passing untransformed frames to TorchVision transforms as our reference? I think we do, because I think that's implied by accepting the TorchVision transforms, and it's a easy way to explain the feature to users.
  2. Is when the transform is applied useful to users? I thought it was, but if it's of little value, we could potentially just not talk about it.

Given how far away the tolerances were when TorchCodec applied the transform in YUV, but TorchVision applied them in RGB, I think that if we ever changed this behavior, it would have to be an option.

custom_frame_mappings (str, bytes, or file-like object, optional):
Mapping of frames to their metadata, typically generated via ffprobe.
This enables accurate frame seeking without requiring a full video scan.
Expand Down Expand Up @@ -104,6 +109,7 @@ def __init__(
num_ffmpeg_threads: int = 1,
device: Optional[Union[str, torch_device]] = "cpu",
seek_mode: Literal["exact", "approximate"] = "exact",
transforms: Optional[Sequence[DecoderTransform]] = None,
custom_frame_mappings: Optional[
Union[str, bytes, io.RawIOBase, io.BufferedReader]
] = None,
Expand Down Expand Up @@ -148,13 +154,16 @@ def __init__(

device_variant = _get_cuda_backend()

transform_specs = _make_transform_specs(transforms)

core.add_video_stream(
self._decoder,
stream_index=stream_index,
dimension_order=dimension_order,
num_threads=num_ffmpeg_threads,
device=device,
device_variant=device_variant,
transform_specs=transform_specs,
custom_frame_mappings=custom_frame_mappings_data,
)

Expand Down Expand Up @@ -432,6 +441,83 @@ def _get_and_validate_stream_metadata(
)


def _convert_to_decoder_native_transforms(
transforms: Sequence[DecoderTransform],
) -> List[DecoderTransform]:
"""Convert a sequence of transforms that may contain TorchVision transform
objects into a list of only TorchCodec transform objects.

Args:
transforms: Squence of transform objects. The objects can be one of two
types:
1. torchcodec.transforms.DecoderTransform
2. torchvision.transforms.v2.Transform
Our type annotation only mentions the first type so that we don't
have a hard dependency on TorchVision.

Returns:
List of DecoderTransform objects.
"""
try:
from torchvision.transforms import v2

tv_available = True
except ImportError:
tv_available = False

converted_transforms = []
for transform in transforms:
if not isinstance(transform, DecoderTransform):
if not tv_available:
raise ValueError(
f"The supplied transform, {transform}, is not a TorchCodec "
" DecoderTransform. TorchCodec also accept TorchVision "
"v2 transforms, but TorchVision is not installed."
)
if isinstance(transform, v2.Resize):
Copy link
Contributor

@NicolasHug NicolasHug Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fails if tv_available is False? Because v2 wouldn't exist

EDIT ah no that's probably fine because of the if not tv_available: check above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me thing we should have a dummy job where we don't install TV that ensures TC still works fine...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a job which doesn't have TorchVision installed: I agree we need to do something here, but I'd like to punt on this for now. The current testing file imports TorchVision unconditionally. I think we'll want to separate out the tests that require TorchVision from those that don't so that we can test both behaviors, but that will require different .py files. I'd like to deal with that in its own PR.

I actually started to add a step in the current linux wheel test that did not install TorchVision when I realized this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can punt on this. I'm hoping we can do something very simple regarding testing: keep all but one test job using torchvision, and just have one small CI job that doesn't install TV and just runs a few tests, basically just insuring TV is an optional dependency. I'd like to avoid separating tests in different files just for that - we may have more than one optional dependency and that quickly becomes untractable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I think I would have been less surprised by v2 being actually optional if this were elif.

Suggested change
if isinstance(transform, v2.Resize):
elif isinstance(transform, v2.Resize):

if len(transform.size) != 2:
raise ValueError(
"TorchVision Resize transform must have a (height, width) "
f"pair for the size, got {transform.size}."
)
converted_transforms.append(Resize(size=transform.size))
else:
raise ValueError(
f"Unsupported transform: {transform}. Transforms must be "
"either a TorchCodec DecoderTransform or a TorchVision "
"v2 transform."
)
else:
converted_transforms.append(transform)

return converted_transforms


def _make_transform_specs(
transforms: Optional[Sequence[DecoderTransform]],
) -> str:
"""Given a sequence of transforms, turn those into the specification string
the core API expects.

Args:
transforms: Optional sequence of transform objects. The objects can be
one of two types:
1. torchcodec.transforms.DecoderTransform
2. torchvision.transforms.v2.Transform
Our type annotation only mentions the first type so that we don't
have a hard dependency on TorchVision.

Returns:
String of transforms in the format the core API expects: transform
specifications separate by semicolons.
"""
if transforms is None:
return ""

transforms = _convert_to_decoder_native_transforms(transforms)
return ";".join([t.make_params() for t in transforms])


def _read_custom_frame_mappings(
custom_frame_mappings: Union[str, bytes, io.RawIOBase, io.BufferedReader]
) -> tuple[Tensor, Tensor, Tensor]:
Expand Down
7 changes: 7 additions & 0 deletions src/torchcodec/transforms/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from ._decoder_transforms import DecoderTransform, Resize # noqa
58 changes: 58 additions & 0 deletions src/torchcodec/transforms/_decoder_transforms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Sequence


@dataclass
class DecoderTransform(ABC):
"""Base class for all decoder transforms.
A DecoderTransform is a transform that is applied by the decoder before
returning the decoded frame. The implementation does not live in TorchCodec
itself, but in the underyling decoder. Applying DecoderTransforms to frames
should be both faster and more memory efficient than receiving normally
decoded frames and applying the same kind of transform.
Most DecoderTransforms have a complementary transform in TorchVision,
specificially in torchvision.transforms.v2. For such transforms, we ensure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add URL

that:
1. Default behaviors are the same.
2. The parameters for the DecoderTransform are a subset of the
TorchVision transform.
3. Parameters with the same name control the same behavior and accept a
subset of the same types.
4. The difference between the frames returned by a DecoderTransform and
the complementary TorchVision transform are small.
All DecoderTranforms are applied in the output pixel format and colorspace.
"""

@abstractmethod
def make_params(self) -> str:
pass


@dataclass
class Resize(DecoderTransform):
"""Resize the decoded frame to a given size.
Complementary TorchVision transform: torchvision.transforms.v2.Resize.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Complementary TorchVision transform: torchvision.transforms.v2.Resize.
Complementary TorchVision transform: :class:`~torchvision.transforms.v2.Resize~.

Interpolation is always bilinear. Anti-aliasing is always on.
Args:
size: (sequence of int): Desired output size. Must be a sequence of
the form (height, width).
"""

size: Sequence[int]

def make_params(self) -> str:
assert len(self.size) == 2
return f"resize, {self.size[0]}, {self.size[1]}"
Loading
Loading