@@ -3,46 +3,98 @@ Decoding / Encoding images and videos
33
44.. currentmodule :: torchvision.io
55
6- The :mod: `torchvision.io ` package provides functions for performing IO
7- operations. They are currently specific to reading and writing images and
8- videos.
6+ The :mod: `torchvision.io ` module provides utilities for decoding and encoding
7+ images and videos.
98
10- Images
11- ------
9+ Image Decoding
10+ --------------
1211
1312Torchvision currently supports decoding JPEG, PNG, WEBP and GIF images. JPEG
1413decoding can also be done on CUDA GPUs.
1514
16- For encoding, JPEG (cpu and CUDA) and PNG are supported.
15+ The main entry point is the :func: `~torchvision.io.decode_image ` function, which
16+ you can use as an alternative to ``PIL.Image.open() ``. It will decode images
17+ straight into image Tensors, thus saving you the conversion and allowing you to
18+ run transforms/preproc natively on tensors.
19+
20+ .. code ::
21+
22+ from torchvision.io import decode_image
23+
24+ img = decode_image("path_to_image", mode="RGB")
25+ img.dtype # torch.uint8
26+
27+ # Or
28+ raw_encoded_bytes = ... # read encoded bytes from your file system
29+ img = decode_image(raw_encoded_bytes, mode="RGB")
30+
31+
32+ :func: `~torchvision.io.decode_image ` will automatically detect the image format,
33+ and call the corresponding decoder. You can also use the lower-level
34+ format-specific decoders which can be more powerful, e.g. if you want to
35+ encode/decode JPEGs on CUDA.
1736
1837.. autosummary ::
1938 :toctree: generated/
2039 :template: function.rst
2140
22- read_image
2341 decode_image
24- encode_jpeg
2542 decode_jpeg
26- write_jpeg
43+ encode_png
2744 decode_gif
2845 decode_webp
29- encode_png
30- decode_png
31- write_png
32- read_file
33- write_file
3446
3547.. autosummary ::
3648 :toctree: generated/
3749 :template: class.rst
3850
3951 ImageReadMode
4052
53+ Obsolete decoding function:
4154
55+ .. autosummary ::
56+ :toctree: generated/
57+ :template: function.rst
58+
59+ read_image
60+
61+ Image Encoding
62+ --------------
63+
64+ For encoding, JPEG (cpu and CUDA) and PNG are supported.
65+
66+
67+ .. autosummary ::
68+ :toctree: generated/
69+ :template: function.rst
70+
71+ encode_jpeg
72+ write_jpeg
73+ encode_png
74+ write_png
75+
76+ IO operations
77+ -------------
78+
79+ .. autosummary ::
80+ :toctree: generated/
81+ :template: function.rst
82+
83+ read_file
84+ write_file
4285
4386Video
4487-----
4588
89+ .. warning ::
90+
91+ Torchvision supports video decoding through different APIs listed below,
92+ some of which are still in BETA stage. In the near future, we intend to
93+ centralize PyTorch's video decoding capabilities within the `torchcodec
94+ <https://github.com/pytorch/torchcodec> `_ project. We encourage you to try
95+ it out and share your feedback, as the torchvision video decoders will
96+ eventually be deprecated.
97+
4698.. autosummary ::
4799 :toctree: generated/
48100 :template: function.rst
@@ -52,45 +104,14 @@ Video
52104 write_video
53105
54106
55- Fine-grained video API
56- ^^^^^^^^^^^^^^^^^^^^^^
107+ **Fine-grained video API **
57108
58109In addition to the :mod: `read_video ` function, we provide a high-performance
59110lower-level API for more fine-grained control compared to the :mod: `read_video ` function.
60111It does all this whilst fully supporting torchscript.
61112
62- .. betastatus :: fine-grained video API
63-
64113.. autosummary ::
65114 :toctree: generated/
66115 :template: class.rst
67116
68117 VideoReader
69-
70-
71- Example of inspecting a video:
72-
73- .. code :: python
74-
75- import torchvision
76- video_path = " path to a test video"
77- # Constructor allocates memory and a threaded decoder
78- # instance per video. At the moment it takes two arguments:
79- # path to the video file, and a wanted stream.
80- reader = torchvision.io.VideoReader(video_path, " video" )
81-
82- # The information about the video can be retrieved using the
83- # `get_metadata()` method. It returns a dictionary for every stream, with
84- # duration and other relevant metadata (often frame rate)
85- reader_md = reader.get_metadata()
86-
87- # metadata is structured as a dict of dicts with following structure
88- # {"stream_type": {"attribute": [attribute per stream]}}
89- #
90- # following would print out the list of frame rates for every present video stream
91- print (reader_md[" video" ][" fps" ])
92-
93- # we explicitly select the stream we would like to operate on. In
94- # the constructor we select a default video stream, but
95- # in practice, we can set whichever stream we would like
96- video.set_current_stream(" video:0" )
0 commit comments