Skip to content

Commit dbb299e

Browse files
author
pytorchbot
committed
2024-10-02 nightly release (9832166)
1 parent 75690c8 commit dbb299e

File tree

5 files changed

+136
-93
lines changed

5 files changed

+136
-93
lines changed

docs/source/io.rst

Lines changed: 61 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,46 @@ Decoding / Encoding images and videos
33

44
.. currentmodule:: torchvision.io
55

6-
The :mod:`torchvision.io` package provides functions for performing IO
7-
operations. They are currently specific to reading and writing images and
8-
videos.
6+
The :mod:`torchvision.io` module provides utilities for decoding and encoding
7+
images and videos.
98

10-
Images
11-
------
9+
Image Decoding
10+
--------------
1211

1312
Torchvision currently supports decoding JPEG, PNG, WEBP and GIF images. JPEG
1413
decoding can also be done on CUDA GPUs.
1514

16-
For encoding, JPEG (cpu and CUDA) and PNG are supported.
15+
The main entry point is the :func:`~torchvision.io.decode_image` function, which
16+
you can use as an alternative to ``PIL.Image.open()``. It will decode images
17+
straight into image Tensors, thus saving you the conversion and allowing you to
18+
run transforms/preproc natively on tensors.
19+
20+
.. code::
21+
22+
from torchvision.io import decode_image
23+
24+
img = decode_image("path_to_image", mode="RGB")
25+
img.dtype # torch.uint8
26+
27+
# Or
28+
raw_encoded_bytes = ... # read encoded bytes from your file system
29+
img = decode_image(raw_encoded_bytes, mode="RGB")
30+
31+
32+
:func:`~torchvision.io.decode_image` will automatically detect the image format,
33+
and call the corresponding decoder. You can also use the lower-level
34+
format-specific decoders which can be more powerful, e.g. if you want to
35+
encode/decode JPEGs on CUDA.
1736

1837
.. autosummary::
1938
:toctree: generated/
2039
:template: function.rst
2140

2241
decode_image
23-
encode_jpeg
2442
decode_jpeg
25-
write_jpeg
43+
encode_png
2644
decode_gif
2745
decode_webp
28-
encode_png
29-
decode_png
30-
write_png
31-
read_file
32-
write_file
3346

3447
.. autosummary::
3548
:toctree: generated/
@@ -41,14 +54,47 @@ Obsolete decoding function:
4154

4255
.. autosummary::
4356
:toctree: generated/
44-
:template: class.rst
57+
:template: function.rst
4558

4659
read_image
4760

61+
Image Encoding
62+
--------------
63+
64+
For encoding, JPEG (cpu and CUDA) and PNG are supported.
65+
66+
67+
.. autosummary::
68+
:toctree: generated/
69+
:template: function.rst
70+
71+
encode_jpeg
72+
write_jpeg
73+
encode_png
74+
write_png
75+
76+
IO operations
77+
-------------
78+
79+
.. autosummary::
80+
:toctree: generated/
81+
:template: function.rst
82+
83+
read_file
84+
write_file
4885

4986
Video
5087
-----
5188

89+
.. warning::
90+
91+
Torchvision supports video decoding through different APIs listed below,
92+
some of which are still in BETA stage. In the near future, we intend to
93+
centralize PyTorch's video decoding capabilities within the `torchcodec
94+
<https://github.com/pytorch/torchcodec>`_ project. We encourage you to try
95+
it out and share your feedback, as the torchvision video decoders will
96+
eventually be deprecated.
97+
5298
.. autosummary::
5399
:toctree: generated/
54100
:template: function.rst
@@ -58,45 +104,14 @@ Video
58104
write_video
59105

60106

61-
Fine-grained video API
62-
^^^^^^^^^^^^^^^^^^^^^^
107+
**Fine-grained video API**
63108

64109
In addition to the :mod:`read_video` function, we provide a high-performance
65110
lower-level API for more fine-grained control compared to the :mod:`read_video` function.
66111
It does all this whilst fully supporting torchscript.
67112

68-
.. betastatus:: fine-grained video API
69-
70113
.. autosummary::
71114
:toctree: generated/
72115
:template: class.rst
73116

74117
VideoReader
75-
76-
77-
Example of inspecting a video:
78-
79-
.. code:: python
80-
81-
import torchvision
82-
video_path = "path to a test video"
83-
# Constructor allocates memory and a threaded decoder
84-
# instance per video. At the moment it takes two arguments:
85-
# path to the video file, and a wanted stream.
86-
reader = torchvision.io.VideoReader(video_path, "video")
87-
88-
# The information about the video can be retrieved using the
89-
# `get_metadata()` method. It returns a dictionary for every stream, with
90-
# duration and other relevant metadata (often frame rate)
91-
reader_md = reader.get_metadata()
92-
93-
# metadata is structured as a dict of dicts with following structure
94-
# {"stream_type": {"attribute": [attribute per stream]}}
95-
#
96-
# following would print out the list of frame rates for every present video stream
97-
print(reader_md["video"]["fps"])
98-
99-
# we explicitly select the stream we would like to operate on. In
100-
# the constructor we select a default video stream, but
101-
# in practice, we can set whichever stream we would like
102-
video.set_current_stream("video:0")

setup.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
IS_ROCM = (torch.version.hip is not None) and (ROCM_HOME is not None)
4343
BUILD_CUDA_SOURCES = (torch.cuda.is_available() and ((CUDA_HOME is not None) or IS_ROCM)) or FORCE_CUDA
4444

45-
PACKAGE_NAME = "torchvision"
45+
package_name = os.getenv("TORCHVISION_PACKAGE_NAME", "torchvision")
4646

4747
print("Torchvision build configuration:")
4848
print(f"{FORCE_CUDA = }")
@@ -98,7 +98,7 @@ def get_dist(pkgname):
9898
except DistributionNotFound:
9999
return None
100100

101-
pytorch_dep = "torch"
101+
pytorch_dep = os.getenv("TORCH_PACKAGE_NAME", "torch")
102102
if os.getenv("PYTORCH_VERSION"):
103103
pytorch_dep += "==" + os.getenv("PYTORCH_VERSION")
104104

@@ -561,7 +561,7 @@ def run(self):
561561
version, sha = get_version()
562562
write_version_file(version, sha)
563563

564-
print(f"Building wheel {PACKAGE_NAME}-{version}")
564+
print(f"Building wheel {package_name}-{version}")
565565

566566
with open("README.md") as f:
567567
readme = f.read()
@@ -573,7 +573,7 @@ def run(self):
573573
]
574574

575575
setup(
576-
name=PACKAGE_NAME,
576+
name=package_name,
577577
version=version,
578578
author="PyTorch Core Team",
579579
author_email="[email protected]",
@@ -583,7 +583,7 @@ def run(self):
583583
long_description_content_type="text/markdown",
584584
license="BSD",
585585
packages=find_packages(exclude=("test",)),
586-
package_data={PACKAGE_NAME: ["*.dll", "*.dylib", "*.so", "prototype/datasets/_builtin/*.categories"]},
586+
package_data={package_name: ["*.dll", "*.dylib", "*.so", "prototype/datasets/_builtin/*.categories"]},
587587
zip_safe=False,
588588
install_requires=get_requirements(),
589589
extras_require={

torchvision/io/image.py

Lines changed: 38 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,25 @@
2020

2121

2222
class ImageReadMode(Enum):
23-
"""
24-
Support for various modes while reading images.
23+
"""Allow automatic conversion to RGB, RGBA, etc while decoding.
24+
25+
.. note::
26+
27+
You don't need to use this struct, you can just pass strings to all
28+
``mode`` parameters, e.g. ``mode="RGB"``.
2529
26-
Use ``ImageReadMode.UNCHANGED`` for loading the image as-is,
27-
``ImageReadMode.GRAY`` for converting to grayscale,
28-
``ImageReadMode.GRAY_ALPHA`` for grayscale with transparency,
29-
``ImageReadMode.RGB`` for RGB and ``ImageReadMode.RGB_ALPHA`` for
30-
RGB with transparency.
30+
The different available modes are the following.
31+
32+
- UNCHANGED: loads the image as-is
33+
- RGB: converts to RGB
34+
- RGBA: converts to RGB with transparency (also aliased as RGB_ALPHA)
35+
- GRAY: converts to grayscale
36+
- GRAY_ALPHA: converts to grayscale with transparency
3137
3238
.. note::
3339
34-
Some decoders won't support all possible values, e.g. a decoder may only
35-
support "RGB" and "RGBA" mode.
40+
Some decoders won't support all possible values, e.g. GRAY and
41+
GRAY_ALPHA are only supported for PNG and JPEG images.
3642
"""
3743

3844
UNCHANGED = 0
@@ -45,8 +51,7 @@ class ImageReadMode(Enum):
4551

4652
def read_file(path: str) -> torch.Tensor:
4753
"""
48-
Reads and outputs the bytes contents of a file as a uint8 Tensor
49-
with one dimension.
54+
Return the bytes contents of a file as a uint8 1D Tensor.
5055
5156
Args:
5257
path (str or ``pathlib.Path``): the path to the file to be read
@@ -62,8 +67,7 @@ def read_file(path: str) -> torch.Tensor:
6267

6368
def write_file(filename: str, data: torch.Tensor) -> None:
6469
"""
65-
Writes the contents of an uint8 tensor with one dimension to a
66-
file.
70+
Write the content of an uint8 1D tensor to a file.
6771
6872
Args:
6973
filename (str or ``pathlib.Path``): the path to the file to be written
@@ -93,10 +97,9 @@ def decode_png(
9397
Args:
9498
input (Tensor[1]): a one dimensional uint8 tensor containing
9599
the raw bytes of the PNG image.
96-
mode (str or ImageReadMode): the read mode used for optionally
97-
converting the image. Default: ``ImageReadMode.UNCHANGED``.
98-
See `ImageReadMode` class for more information on various
99-
available modes.
100+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
101+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
102+
for available modes.
100103
apply_exif_orientation (bool): apply EXIF orientation transformation to the output tensor.
101104
Default: False.
102105
@@ -156,8 +159,7 @@ def decode_jpeg(
156159
device: Union[str, torch.device] = "cpu",
157160
apply_exif_orientation: bool = False,
158161
) -> Union[torch.Tensor, List[torch.Tensor]]:
159-
"""
160-
Decode JPEG image(s) into 3 dimensional RGB or grayscale Tensor(s).
162+
"""Decode JPEG image(s) into 3D RGB or grayscale Tensor(s), on CPU or CUDA.
161163
162164
The values of the output tensor are uint8 between 0 and 255.
163165
@@ -171,12 +173,9 @@ def decode_jpeg(
171173
input (Tensor[1] or list[Tensor[1]]): a (list of) one dimensional uint8 tensor(s) containing
172174
the raw bytes of the JPEG image. The tensor(s) must be on CPU,
173175
regardless of the ``device`` parameter.
174-
mode (str or ImageReadMode): the read mode used for optionally
175-
converting the image(s). The supported modes are: ``ImageReadMode.UNCHANGED``,
176-
``ImageReadMode.GRAY`` and ``ImageReadMode.RGB``
177-
Default: ``ImageReadMode.UNCHANGED``.
178-
See ``ImageReadMode`` class for more information on various
179-
available modes.
176+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
177+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
178+
for available modes.
180179
device (str or torch.device): The device on which the decoded image will
181180
be stored. If a cuda device is specified, the image will be decoded
182181
with `nvjpeg <https://developer.nvidia.com/nvjpeg>`_. This is only
@@ -228,9 +227,7 @@ def decode_jpeg(
228227
def encode_jpeg(
229228
input: Union[torch.Tensor, List[torch.Tensor]], quality: int = 75
230229
) -> Union[torch.Tensor, List[torch.Tensor]]:
231-
"""
232-
Takes a (list of) input tensor(s) in CHW layout and returns a (list of) buffer(s) with the contents
233-
of the corresponding JPEG file(s).
230+
"""Encode RGB tensor(s) into raw encoded jpeg bytes, on CPU or CUDA.
234231
235232
.. note::
236233
Passing a list of CUDA tensors is more efficient than repeated individual calls to ``encode_jpeg``.
@@ -286,7 +283,7 @@ def decode_image(
286283
mode: ImageReadMode = ImageReadMode.UNCHANGED,
287284
apply_exif_orientation: bool = False,
288285
) -> torch.Tensor:
289-
"""Decode an image into a tensor.
286+
"""Decode an image into a uint8 tensor, from a path or from raw encoded bytes.
290287
291288
Currently supported image formats are jpeg, png, gif and webp.
292289
@@ -303,10 +300,9 @@ def decode_image(
303300
input (Tensor or str or ``pathlib.Path``): The image to decode. If a
304301
tensor is passed, it must be one dimensional uint8 tensor containing
305302
the raw bytes of the image. Otherwise, this must be a path to the image file.
306-
mode (str or ImageReadMode): the read mode used for optionally converting the image.
307-
Default: ``ImageReadMode.UNCHANGED``.
308-
See ``ImageReadMode`` class for more information on various
309-
available modes. Only applies to JPEG and PNG images.
303+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
304+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
305+
for available modes.
310306
apply_exif_orientation (bool): apply EXIF orientation transformation to the output tensor.
311307
Only applies to JPEG and PNG images. Default: False.
312308
@@ -367,9 +363,9 @@ def decode_webp(
367363
Args:
368364
input (Tensor[1]): a one dimensional contiguous uint8 tensor containing
369365
the raw bytes of the WEBP image.
370-
mode (str or ImageReadMode): The read mode used for optionally
371-
converting the image color space. Default: ``ImageReadMode.UNCHANGED``.
372-
Other supported values are ``ImageReadMode.RGB`` and ``ImageReadMode.RGB_ALPHA``.
366+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
367+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
368+
for available modes.
373369
374370
Returns:
375371
Decoded image (Tensor[image_channels, image_height, image_width])
@@ -398,9 +394,9 @@ def _decode_avif(
398394
Args:
399395
input (Tensor[1]): a one dimensional contiguous uint8 tensor containing
400396
the raw bytes of the AVIF image.
401-
mode (str or ImageReadMode): The read mode used for optionally
402-
converting the image color space. Default: ``ImageReadMode.UNCHANGED``.
403-
Other supported values are ``ImageReadMode.RGB`` and ``ImageReadMode.RGB_ALPHA``.
397+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
398+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
399+
for available modes.
404400
405401
Returns:
406402
Decoded image (Tensor[image_channels, image_height, image_width])
@@ -426,9 +422,9 @@ def _decode_heic(input: torch.Tensor, mode: ImageReadMode = ImageReadMode.UNCHAN
426422
Args:
427423
input (Tensor[1]): a one dimensional contiguous uint8 tensor containing
428424
the raw bytes of the HEIC image.
429-
mode (str or ImageReadMode): The read mode used for optionally
430-
converting the image color space. Default: ``ImageReadMode.UNCHANGED``.
431-
Other supported values are ``ImageReadMode.RGB`` and ``ImageReadMode.RGB_ALPHA``.
425+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
426+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
427+
for available modes.
432428
433429
Returns:
434430
Decoded image (Tensor[image_channels, image_height, image_width])

0 commit comments

Comments
 (0)