Skip to content

Commit 2831f11

Browse files
bjuncekBruno Korbarfmassa
authored
VideoAPI docs update (#2802)
* Video reader now returns dicts * docs update * Minor improvements Co-authored-by: Bruno Korbar <[email protected]> Co-authored-by: Francisco Massa <[email protected]>
1 parent b8e9308 commit 2831f11

File tree

3 files changed

+54
-21
lines changed

3 files changed

+54
-21
lines changed

docs/source/io.rst

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ lower-level API for more fine-grained control compared to the :mod:`read_video`
2525
It does all this whilst fully supporting torchscript.
2626

2727
.. autoclass:: VideoReader
28-
:members: next, get_metadata, set_current_stream, seek
28+
:members: __next__, get_metadata, set_current_stream, seek
2929

3030

31-
Example of usage:
31+
Example of inspecting a video:
3232

3333
.. code:: python
3434
@@ -50,6 +50,11 @@ Example of usage:
5050
# following would print out the list of frame rates for every present video stream
5151
print(reader_md["video"]["fps"])
5252
53+
# we explicitly select the stream we would like to operate on. In
54+
# the constructor we select a default video stream, but
55+
# in practice, we can set whichever stream we would like
56+
video.set_current_stream("video:0")
57+
5358
5459
Image
5560
-----

test/test_video.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -244,11 +244,11 @@ def _template_read_video(video_object, s=0, e=None):
244244
video_frames = torch.empty(0)
245245
frames = []
246246
video_pts = []
247-
for t, pts in itertools.takewhile(lambda x: x[1] <= e, video_object):
248-
if pts < s:
247+
for frame in itertools.takewhile(lambda x: x['pts'] <= e, video_object):
248+
if frame['pts'] < s:
249249
continue
250-
frames.append(t)
251-
video_pts.append(pts)
250+
frames.append(frame['data'])
251+
video_pts.append(frame['pts'])
252252
if len(frames) > 0:
253253
video_frames = torch.stack(frames, 0)
254254

@@ -257,11 +257,11 @@ def _template_read_video(video_object, s=0, e=None):
257257
audio_frames = torch.empty(0)
258258
frames = []
259259
audio_pts = []
260-
for t, pts in itertools.takewhile(lambda x: x[1] <= e, video_object):
261-
if pts < s:
260+
for frame in itertools.takewhile(lambda x: x['pts'] <= e, video_object):
261+
if frame['pts'] < s:
262262
continue
263-
frames.append(t)
264-
audio_pts.append(pts)
263+
frames.append(frame['data'])
264+
audio_pts.append(frame['pts'])
265265
if len(frames) > 0:
266266
audio_frames = torch.stack(frames, 0)
267267

@@ -293,8 +293,8 @@ def test_read_video_tensor(self):
293293
# pass 2: decode all frames using new api
294294
reader = VideoReader(full_path, "video")
295295
frames = []
296-
for t, _ in reader:
297-
frames.append(t)
296+
for frame in reader:
297+
frames.append(frame['data'])
298298
new_api = torch.stack(frames, 0)
299299
self.assertEqual(tv_result.size(), new_api.size())
300300

torchvision/io/__init__.py

Lines changed: 37 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -41,21 +41,48 @@ class VideoReader:
4141
container.
4242
4343
Example:
44-
The following examples creates :mod:`Video` object, seeks into 2s
44+
The following examples creates a :mod:`VideoReader` object, seeks into 2s
4545
point, and returns a single frame::
4646
import torchvision
4747
video_path = "path_to_a_test_video"
4848
4949
reader = torchvision.io.VideoReader(video_path, "video")
5050
reader.seek(2.0)
51-
frame, timestamp = next(reader)
51+
frame = next(reader)
52+
53+
:mod:`VideoReader` implements the iterable API, which makes it suitable to
54+
using it in conjunction with :mod:`itertools` for more advanced reading.
55+
As such, we can use a :mod:`VideoReader` instance inside for loops::
56+
reader.seek(2)
57+
for frame in reader:
58+
frames.append(frame['data'])
59+
# additionally, `seek` implements a fluent API, so we can do
60+
for frame in reader.seek(2):
61+
frames.append(frame['data'])
62+
With :mod:`itertools`, we can read all frames between 2 and 5 seconds with the
63+
following code::
64+
for frame in itertools.takewhile(lambda x: x['pts'] <= 5, reader.seek(2)):
65+
frames.append(frame['data'])
66+
and similarly, reading 10 frames after the 2s timestamp can be achieved
67+
as follows::
68+
for frame in itertools.islice(reader.seek(2), 10):
69+
frames.append(frame['data'])
70+
71+
.. note::
72+
73+
Each stream descriptor consists of two parts: stream type (e.g. 'video') and
74+
a unique stream id (which are determined by the video encoding).
75+
In this way, if the video contaner contains multiple
76+
streams of the same type, users can acces the one they want.
77+
If only stream type is passed, the decoder auto-detects first stream of that type.
5278
5379
Args:
5480
5581
path (string): Path to the video file in supported format
5682
57-
stream (string, optional): descriptor of the required stream. Defaults to "video:0"
58-
Currently available options include :mod:`['video', 'audio', 'cc', 'sub']`
83+
stream (string, optional): descriptor of the required stream, followed by the stream id,
84+
in the format ``{stream_type}:{stream_id}``. Defaults to ``"video:0"``.
85+
Currently available options include ``['video', 'audio']``
5986
"""
6087

6188
def __init__(self, path, stream="video"):
@@ -67,13 +94,14 @@ def __next__(self):
6794
"""Decodes and returns the next frame of the current stream
6895
6996
Returns:
70-
([torch.Tensor, float]): list containing decoded frame and corresponding timestamp
97+
(dict): a dictionary with fields ``data`` and ``pts``
98+
containing decoded frame and corresponding timestamp
7199
72100
"""
73101
frame, pts = self._c.next()
74102
if frame.numel() == 0:
75103
raise StopIteration
76-
return frame, pts
104+
return {"data": frame, "pts": pts}
77105

78106
def __iter__(self):
79107
return self
@@ -88,7 +116,7 @@ def seek(self, time_s: float):
88116
Current implementation is the so-called precise seek. This
89117
means following seek, call to :mod:`next()` will return the
90118
frame with the exact timestamp if it exists or
91-
the first frame with timestamp larger than time_s.
119+
the first frame with timestamp larger than ``time_s``.
92120
"""
93121
self._c.seek(time_s)
94122
return self
@@ -106,8 +134,8 @@ def set_current_stream(self, stream: str):
106134
Explicitly define the stream we are operating on.
107135
108136
Args:
109-
stream (string): descriptor of the required stream. Defaults to "video:0"
110-
Currently available stream types include :mod:`['video', 'audio', 'cc', 'sub']`.
137+
stream (string): descriptor of the required stream. Defaults to ``"video:0"``
138+
Currently available stream types include ``['video', 'audio']``.
111139
Each descriptor consists of two parts: stream type (e.g. 'video') and
112140
a unique stream id (which are determined by video encoding).
113141
In this way, if the video contaner contains multiple

0 commit comments

Comments
 (0)