@@ -3,33 +3,46 @@ Decoding / Encoding images and videos
33
44.. currentmodule :: torchvision.io
55
6- The :mod: `torchvision.io ` package provides functions for performing IO
7- operations. They are currently specific to reading and writing images and
8- videos.
6+ The :mod: `torchvision.io ` module provides utilities for decoding and encoding
7+ images and videos.
98
10- Images
11- ------
9+ Image Decoding
10+ --------------
1211
1312Torchvision currently supports decoding JPEG, PNG, WEBP and GIF images. JPEG
1413decoding can also be done on CUDA GPUs.
1514
16- For encoding, JPEG (cpu and CUDA) and PNG are supported.
15+ The main entry point is the :func: `~torchvision.io.decode_image ` function, which
16+ you can use as an alternative to ``PIL.Image.open() ``. It will decode images
17+ straight into image Tensors, thus saving you the conversion and allowing you to
18+ run transforms/preproc natively on tensors.
19+
20+ .. code ::
21+
22+ from torchvision.io import decode_image
23+
24+ img = decode_image("path_to_image", mode="RGB")
25+ img.dtype # torch.uint8
26+
27+ # Or
28+ raw_encoded_bytes = ... # read encoded bytes from your file system
29+ img = decode_image(raw_encoded_bytes, mode="RGB")
30+
31+
32+ :func: `~torchvision.io.decode_image ` will automatically detect the image format,
33+ and call the corresponding decoder. You can also use the lower-level
34+ format-specific decoders which can be more powerful, e.g. if you want to
35+ encode/decode JPEGs on CUDA.
1736
1837.. autosummary ::
1938 :toctree: generated/
2039 :template: function.rst
2140
2241 decode_image
23- encode_jpeg
2442 decode_jpeg
25- write_jpeg
43+ encode_png
2644 decode_gif
2745 decode_webp
28- encode_png
29- decode_png
30- write_png
31- read_file
32- write_file
3346
3447.. autosummary ::
3548 :toctree: generated/
@@ -41,14 +54,47 @@ Obsolete decoding function:
4154
4255.. autosummary ::
4356 :toctree: generated/
44- :template: class .rst
57+ :template: function .rst
4558
4659 read_image
4760
61+ Image Encoding
62+ --------------
63+
64+ For encoding, JPEG (cpu and CUDA) and PNG are supported.
65+
66+
67+ .. autosummary ::
68+ :toctree: generated/
69+ :template: function.rst
70+
71+ encode_jpeg
72+ write_jpeg
73+ encode_png
74+ write_png
75+
76+ IO operations
77+ -------------
78+
79+ .. autosummary ::
80+ :toctree: generated/
81+ :template: function.rst
82+
83+ read_file
84+ write_file
4885
4986Video
5087-----
5188
89+ .. warning ::
90+
91+ Torchvision supports video decoding through different APIs listed below,
92+ some of which are still in BETA stage. In the near future, we intend to
93+ centralize PyTorch's video decoding capabilities within the `torchcodec
94+ <https://github.com/pytorch/torchcodec> `_ project. We encourage you to try
95+ it out and share your feedback, as the torchvision video decoders will
96+ eventually be deprecated.
97+
5298.. autosummary ::
5399 :toctree: generated/
54100 :template: function.rst
@@ -58,45 +104,14 @@ Video
58104 write_video
59105
60106
61- Fine-grained video API
62- ^^^^^^^^^^^^^^^^^^^^^^
107+ **Fine-grained video API **
63108
64109In addition to the :mod: `read_video ` function, we provide a high-performance
65110lower-level API for more fine-grained control compared to the :mod: `read_video ` function.
66111It does all this whilst fully supporting torchscript.
67112
68- .. betastatus :: fine-grained video API
69-
70113.. autosummary ::
71114 :toctree: generated/
72115 :template: class.rst
73116
74117 VideoReader
75-
76-
77- Example of inspecting a video:
78-
79- .. code :: python
80-
81- import torchvision
82- video_path = " path to a test video"
83- # Constructor allocates memory and a threaded decoder
84- # instance per video. At the moment it takes two arguments:
85- # path to the video file, and a wanted stream.
86- reader = torchvision.io.VideoReader(video_path, " video" )
87-
88- # The information about the video can be retrieved using the
89- # `get_metadata()` method. It returns a dictionary for every stream, with
90- # duration and other relevant metadata (often frame rate)
91- reader_md = reader.get_metadata()
92-
93- # metadata is structured as a dict of dicts with following structure
94- # {"stream_type": {"attribute": [attribute per stream]}}
95- #
96- # following would print out the list of frame rates for every present video stream
97- print (reader_md[" video" ][" fps" ])
98-
99- # we explicitly select the stream we would like to operate on. In
100- # the constructor we select a default video stream, but
101- # in practice, we can set whichever stream we would like
102- video.set_current_stream(" video:0" )
0 commit comments