Skip to content

Commit 304fdf9

Browse files
author
Molly Xu
committed
first draft of performance tips tutorial
1 parent 408b373 commit 304fdf9

File tree

2 files changed

+160
-0
lines changed

2 files changed

+160
-0
lines changed

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ def __call__(self, filename):
8181
"approximate_mode.py",
8282
"sampling.py",
8383
"parallel_decoding.py",
84+
"performance_tips.py",
8485
"custom_frame_mappings.py",
8586
]
8687
else:
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the BSD-style license found in the
5+
# LICENSE file in the root directory of this source tree.
6+
7+
"""
8+
====================================
9+
Performance Tips and Best Practices
10+
====================================
11+
12+
This tutorial consolidates performance optimization techniques for video
13+
decoding with TorchCodec. Learn when and how to apply various strategies
14+
to increase performance.
15+
"""
16+
17+
18+
# %%
19+
# Overview
20+
# --------
21+
#
22+
# When decoding videos with TorchCodec, several techniques can significantly
23+
# improve performance depending on your use case. This guide covers:
24+
#
25+
# 1. **Batch APIs** - Decode multiple frames at once
26+
# 2. **Approximate Mode & Keyframe Mappings** - Trade accuracy for speed
27+
# 3. **Multi-threading** - Parallelize decoding across videos or chunks
28+
# 4. **CUDA Acceleration (BETA)** - Use GPU decoding for supported formats
29+
#
30+
# We'll explore each technique and when to use it.
31+
32+
# %%
33+
# 1. Use Batch APIs When Possible
34+
# --------------------------------
35+
#
36+
# If you need to decode multiple frames at once, it is faster when using the batch methods. TorchCodec's batch APIs reduce overhead and can leverage
37+
# internal optimizations.
38+
#
39+
# **Key Methods:**
40+
#
41+
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices
42+
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges
43+
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps
44+
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges
45+
#
46+
# **When to use:**
47+
#
48+
# - Decoding multiple frames
49+
50+
# %%
51+
# .. note::
52+
#
53+
# For complete examples with runnable code demonstrating batch decoding,
54+
# iteration, and frame retrieval, see:
55+
#
56+
# - :ref:`sphx_glr_generated_examples_decoding_basic_example.py`
57+
58+
# %%
59+
# 2. Approximate Mode & Keyframe Mappings
60+
# ----------------------------------------
61+
#
62+
# By default, TorchCodec uses ``seek_mode="exact"``, which performs a scan when
63+
# the decoder is created to build an accurate internal index of frames. This
64+
# ensures frame-accurate seeking but takes longer for decoder initialization,
65+
# especially on long videos.
66+
67+
# %%
68+
# **Approximate Mode**
69+
# ~~~~~~~~~~~~~~~~~~~~
70+
#
71+
# Setting ``seek_mode="approximate"`` skips the initial scan and relies on the
72+
# video file's metadata headers. This dramatically speeds up
73+
# :class:`~torchcodec.decoders.VideoDecoder` creation, particularly for long
74+
# videos, but may result in slightly less accurate seeking in some cases.
75+
#
76+
#
77+
# **Which mode should you use:**
78+
#
79+
# - If you care about exactness of frame seeking, use “exact”.
80+
# - If you can sacrifice exactness of seeking for speed, which is usually the case when doing clip sampling, use “approximate”.
81+
# - If your videos don’t have variable framerate and their metadata is correct, then “approximate” mode is a net win: it will be just as accurate as the “exact” mode while still being significantly faster.
82+
# - If your size is small enough and we’re decoding a lot of frames, there’s a chance exact mode is actually faster.
83+
84+
# %%
85+
# **Custom Frame Mappings**
86+
# ~~~~~~~~~~~~~~~~~~~~~~~~~
87+
#
88+
# For advanced use cases, you can pre-compute a custom mapping between desired
89+
# frame indices and actual keyframe locations. This allows you to speed up :class:`~torchcodec.decoders.VideoDecoder`
90+
# instantiation while maintaining the frame seeking accuracy of ``seek_mode="exact"``
91+
#
92+
# **When to use:**
93+
#
94+
# - Frame accuracy is critical, so approximate mode cannot be used
95+
# - Videos can be preprocessed once and then decoded many times
96+
#
97+
# **Performance impact:** Enables consistent, predictable performance for repeated
98+
# random access without the overhead of exact mode's scanning.
99+
100+
# %%
101+
# .. note::
102+
#
103+
# For complete benchmarks showing actual speedup numbers, accuracy comparisons,
104+
# and implementation examples, see:
105+
#
106+
# - :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py`
107+
#
108+
# - :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py`
109+
110+
# %%
111+
# 3. Multi-threading for Parallel Decoding
112+
# -----------------------------------------
113+
#
114+
# For video decoding of a large number of frames from a single video, there are a few parallelization strategies to speed up the decoding process:
115+
#
116+
# - FFmpeg-based parallelism: Using FFmpeg's internal threading capabilities
117+
# - Multiprocessing: Distributing work across multiple processes
118+
# - Multithreading: Using multiple threads within a single process
119+
120+
# %%
121+
# .. note::
122+
#
123+
# For complete examples comparing
124+
# sequential, ffmpeg-based parallelism, multi-process, and multi-threaded approaches, see:
125+
#
126+
# - :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py`
127+
128+
# %%
129+
# 4. BETA: CUDA Acceleration
130+
# ---------------------------
131+
#
132+
# TorchCodec supports GPU-accelerated decoding using NVIDIA's hardware decoder
133+
# (NVDEC) on supported hardware. This keeps decoded tensors in GPU memory,
134+
# avoiding expensive CPU-GPU transfers for downstream GPU operations.
135+
#
136+
# **When to use:**
137+
#
138+
# - Decoding large resolution videos
139+
# - Large batch of videos saturating the CPU
140+
# - GPU-intensive pipelines with transforms like scaling and cropping
141+
# - CPU is saturated and you want to free it up for other work
142+
#
143+
# **When NOT to use:**
144+
#
145+
# - You need bit-exact results
146+
# - Small resolution videos and the PCI-e transfer latency is large
147+
# - GPU is already busy and CPU is idle
148+
#
149+
# **Performance impact:** CUDA decoding can significantly outperform CPU decoding,
150+
# especially for high-resolution videos and when combined with GPU-based transforms.
151+
# Actual speedup varies by hardware, resolution, and codec.
152+
153+
# %%
154+
# .. note::
155+
#
156+
# For installation instructions, detailed examples, and visual comparisons
157+
# between CPU and CUDA decoding, see:
158+
#
159+
# - :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py`

0 commit comments

Comments
 (0)