Skip to content

Commit 76ec888

Browse files
committed
feat(docs): 添加 TorchCodec 视频处理相关文档和示例
新增 TorchCodec 视频处理模块的文档,包括安装指南、基础示例、并行解码、自定义帧映射、近似模式、文件流式处理等教程。同时更新了文档索引和 CI 配置以支持新内容。 - 添加视频处理核心文档 index.md - 新增 6 个视频处理示例 Notebook - 更新文档目录结构和 CI 工作流 - 补充安装说明和依赖配置
1 parent ffdaa6b commit 76ec888

19 files changed

+3451
-7
lines changed

.github/workflows/pages.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ jobs:
4646
apt-get update && sudo apt-get upgrade
4747
pip install --upgrade pip
4848
pip install -ve .[doc,flows,dev]
49+
conda install -c conda-forge torchcodec
4950
- name: 🔧 Build HTML
5051
run: |
5152
invoke doc

doc/TorchCodec/audio/decoding.ipynb

Lines changed: 309 additions & 0 deletions
Large diffs are not rendered by default.

doc/TorchCodec/audio/encoding.ipynb

Lines changed: 260 additions & 0 deletions
Large diffs are not rendered by default.

doc/TorchCodec/audio/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# 音频处理
2+
3+
```{toctree}
4+
encoding
5+
decoding
6+
```

doc/TorchCodec/index.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# TorchCodec 教程
2+
3+
TorchCodec 是一个 Python 库,用于在 CPU 和 CUDA GPU 上将视频和音频数据解码为 PyTorch 张量。它的目标是快速、易于使用,并很好地集成到 PyTorch 生态系统中。如果您想使用 PyTorch 在视频和音频上训练 ML 模型,TorchCodec 就是将这些模型转换为数据的方式:
4+
- 镜像 Python 和 PyTorch 约定的 Pythonic API。
5+
- 依靠 [FFmpeg](https://www.ffmpeg.org/) 进行解码/编码。TorchCodec 使用您已经安装的 FFmpeg 版本。FMPEG 是一个成熟的库,在大多数系统上都具有广泛的覆盖范围。然而,它并不容易使用。TorchCodec 抽象了 FFmpeg 的复杂性,以确保正确有效地使用它。
6+
- 将数据作为 PyTorch 张量返回,随时可以输入到 PyTorch 变换中或直接用于训练模型。
7+
8+
```{toctree}
9+
install
10+
audio/index
11+
video/index
12+
```

doc/TorchCodec/install.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# 安装
2+
3+
如果尚未安装 FFmpeg,或者你需要更新的版本,安装它的简单方法是使用 `conda`
4+
5+
```bash
6+
conda install "ffmpeg"
7+
# 或者
8+
conda install "ffmpeg" -c conda-forge
9+
```
10+
11+
安装 TorchCodec:
12+
13+
```bash
14+
conda install -c conda-forge torchcodec
15+
```

doc/TorchCodec/video/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.mp4
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# 精确与近似寻址模式:性能与准确性对比\n",
8+
"\n",
9+
"本示例介绍 :class:`torchcodec.decoders.VideoDecoder` 的 `seek_mode` 参数。\n",
10+
"该参数在解码器创建速度与帧寻址准确性之间做权衡(例如在近似模式下,请求第 `i` 帧不一定返回第 `i` 帧)。\n"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"metadata": {},
16+
"source": [
17+
"## 准备:下载短视频并生成长视频\n",
18+
"我们从网络下载一个约 13 秒的短视频,并用 `ffmpeg` 将其循环 100 次,得到一个约 20 分钟的长视频。\n"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": 1,
24+
"id": "ceb2a897",
25+
"metadata": {},
26+
"outputs": [
27+
{
28+
"name": "stdout",
29+
"output_type": "stream",
30+
"text": [
31+
"短视频时长: 13.8 秒\n",
32+
"长视频时长: 23.0 分钟\n"
33+
]
34+
}
35+
],
36+
"source": [
37+
"import torch\n",
38+
"import httpx\n",
39+
"import tempfile\n",
40+
"from pathlib import Path\n",
41+
"import shutil\n",
42+
"import subprocess\n",
43+
"from time import perf_counter_ns\n",
44+
"\n",
45+
"# 视频来源: https://www.pexels.com/video/dog-eating-854132/ 许可: CC0 作者: Coverr\n",
46+
"url = \"https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4\"\n",
47+
"headers = {\"User-Agent\": \"\"}\n",
48+
"\n",
49+
"temp_dir = tempfile.mkdtemp()\n",
50+
"short_video_path = Path(temp_dir) / \"short_video.mp4\"\n",
51+
"with httpx.stream(\"GET\", url, headers=headers, follow_redirects=True) as r:\n",
52+
" if r.status_code != 200:\n",
53+
" raise RuntimeError(f\"Failed to download video. status_code = {r.status_code}.\")\n",
54+
" with open(short_video_path, 'wb') as f:\n",
55+
" for chunk in r.iter_bytes():\n",
56+
" if chunk:\n",
57+
" f.write(chunk)\n",
58+
"\n",
59+
"long_video_path = Path(temp_dir) / \"long_video.mp4\"\n",
60+
"ffmpeg_command = [\n",
61+
" \"ffmpeg\",\n",
62+
" \"-stream_loop\", \"99\", # 重复 100 次\n",
63+
" \"-i\", f\"{short_video_path}\",\n",
64+
" \"-c\", \"copy\",\n",
65+
" f\"{long_video_path}\"\n",
66+
"]\n",
67+
"subprocess.run(ffmpeg_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n",
68+
"\n",
69+
"from torchcodec.decoders import VideoDecoder\n",
70+
"print(f\"短视频时长: {VideoDecoder(short_video_path).metadata.duration_seconds} 秒\")\n",
71+
"print(f\"长视频时长: {VideoDecoder(long_video_path).metadata.duration_seconds / 60} 分钟\")\n"
72+
]
73+
},
74+
{
75+
"cell_type": "markdown",
76+
"metadata": {},
77+
"source": [
78+
"## 性能:解码器创建耗时\n",
79+
"`seek_mode` 最直接影响的是 :class:`torchcodec.decoders.VideoDecoder` 的\n",
80+
"创建耗时;视频越长,近似模式的收益越明显。\n"
81+
]
82+
},
83+
{
84+
"cell_type": "code",
85+
"execution_count": 2,
86+
"metadata": {},
87+
"outputs": [
88+
{
89+
"name": "stdout",
90+
"output_type": "stream",
91+
"text": [
92+
"在短视频上创建 seek_mode='exact' 的解码器:\n",
93+
"med = 4.58ms +- 0.44\n",
94+
"在短视频上创建 seek_mode='approximate' 的解码器:\n",
95+
"med = 4.13ms +- 0.45\n",
96+
"\n",
97+
"在长视频上创建 seek_mode='exact' 的解码器:\n",
98+
"med = 49.23ms +- 2.61\n",
99+
"在长视频上创建 seek_mode='approximate' 的解码器:\n",
100+
"med = 5.31ms +- 0.86\n"
101+
]
102+
}
103+
],
104+
"source": [
105+
"def bench(f, average_over=50, warmup=2, **f_kwargs):\n",
106+
" for _ in range(warmup):\n",
107+
" f(**f_kwargs)\n",
108+
" times = []\n",
109+
" for _ in range(average_over):\n",
110+
" start = perf_counter_ns()\n",
111+
" f(**f_kwargs)\n",
112+
" end = perf_counter_ns()\n",
113+
" times.append(end - start)\n",
114+
" times = torch.tensor(times) * 1e-6\n",
115+
" std = times.std().item()\n",
116+
" med = times.median().item()\n",
117+
" print(f\"{med = :.2f}ms +- {std:.2f}\")\n",
118+
"\n",
119+
"print(\"在短视频上创建 seek_mode='exact' 的解码器:\")\n",
120+
"bench(VideoDecoder, source=short_video_path, seek_mode=\"exact\")\n",
121+
"print(\"在短视频上创建 seek_mode='approximate' 的解码器:\")\n",
122+
"bench(VideoDecoder, source=short_video_path, seek_mode=\"approximate\")\n",
123+
"print()\n",
124+
"print(\"在长视频上创建 seek_mode='exact' 的解码器:\")\n",
125+
"bench(VideoDecoder, source=long_video_path, seek_mode=\"exact\")\n",
126+
"print(\"在长视频上创建 seek_mode='approximate' 的解码器:\")\n",
127+
"bench(VideoDecoder, source=long_video_path, seek_mode=\"approximate\")\n"
128+
]
129+
},
130+
{
131+
"cell_type": "markdown",
132+
"metadata": {},
133+
"source": [
134+
"## 性能:帧解码与片段采样\n",
135+
"严格来说,`seek_mode` 只影响解码器创建本身;并不直接影响解码或采样。\n",
136+
"但实际流程往往为每个视频先创建解码器,因此它会间接影响总体耗时。\n"
137+
]
138+
},
139+
{
140+
"cell_type": "code",
141+
"execution_count": 3,
142+
"metadata": {},
143+
"outputs": [
144+
{
145+
"name": "stdout",
146+
"output_type": "stream",
147+
"text": [
148+
"使用 seek_mode='exact' 进行片段采样:\n",
149+
"med = 131.16ms +- 16.07\n",
150+
"使用 seek_mode='approximate' 进行片段采样:\n",
151+
"med = 88.62ms +- 23.35\n"
152+
]
153+
}
154+
],
155+
"source": [
156+
"from torchcodec import samplers\n",
157+
"\n",
158+
"def sample_clips(seek_mode):\n",
159+
" return samplers.clips_at_random_indices(\n",
160+
" decoder=VideoDecoder(\n",
161+
" source=long_video_path,\n",
162+
" seek_mode=seek_mode\n",
163+
" ),\n",
164+
" num_clips=5,\n",
165+
" num_frames_per_clip=2,\n",
166+
" )\n",
167+
"\n",
168+
"print(\"使用 seek_mode='exact' 进行片段采样:\")\n",
169+
"bench(sample_clips, seek_mode=\"exact\")\n",
170+
"print(\"使用 seek_mode='approximate' 进行片段采样:\")\n",
171+
"bench(sample_clips, seek_mode=\"approximate\")\n"
172+
]
173+
},
174+
{
175+
"cell_type": "markdown",
176+
"metadata": {},
177+
"source": [
178+
"## 准确性:元数据与帧获取\n",
179+
"`seek_mode=\"approximate\"` 能显著加速创建,但代价是寻址不如精确模式准确,\n",
180+
"也可能影响元数据的精确性。很多情况下两者没有差异,此时近似模式是 \"净收益\"\n"
181+
]
182+
},
183+
{
184+
"cell_type": "code",
185+
"execution_count": 4,
186+
"metadata": {
187+
"tags": [
188+
"hide-output"
189+
]
190+
},
191+
"outputs": [
192+
{
193+
"name": "stdout",
194+
"output_type": "stream",
195+
"text": [
196+
"短视频元数据(exact):\n",
197+
"VideoStreamMetadata:\n",
198+
" duration_seconds_from_header: 13.8\n",
199+
" begin_stream_seconds_from_header: 0.0\n",
200+
" bit_rate: 505790.0\n",
201+
" codec: h264\n",
202+
" stream_index: 0\n",
203+
" begin_stream_seconds_from_content: 0.0\n",
204+
" end_stream_seconds_from_content: 13.8\n",
205+
" width: 640\n",
206+
" height: 360\n",
207+
" num_frames_from_header: 345\n",
208+
" num_frames_from_content: 345\n",
209+
" average_fps_from_header: 25.0\n",
210+
" pixel_aspect_ratio: 1\n",
211+
" duration_seconds: 13.8\n",
212+
" begin_stream_seconds: 0.0\n",
213+
" end_stream_seconds: 13.8\n",
214+
" num_frames: 345\n",
215+
" average_fps: 25.0\n",
216+
"\n",
217+
"短视频元数据(approximate):\n",
218+
"VideoStreamMetadata:\n",
219+
" duration_seconds_from_header: 13.8\n",
220+
" begin_stream_seconds_from_header: 0.0\n",
221+
" bit_rate: 505790.0\n",
222+
" codec: h264\n",
223+
" stream_index: 0\n",
224+
" begin_stream_seconds_from_content: None\n",
225+
" end_stream_seconds_from_content: None\n",
226+
" width: 640\n",
227+
" height: 360\n",
228+
" num_frames_from_header: 345\n",
229+
" num_frames_from_content: None\n",
230+
" average_fps_from_header: 25.0\n",
231+
" pixel_aspect_ratio: 1\n",
232+
" duration_seconds: 13.8\n",
233+
" begin_stream_seconds: 0\n",
234+
" end_stream_seconds: 13.8\n",
235+
" num_frames: 345\n",
236+
" average_fps: 25.0\n",
237+
"\n",
238+
"该视频上,两种模式的帧寻址一致!\n"
239+
]
240+
}
241+
],
242+
"source": [
243+
"print(\"短视频元数据(exact):\")\n",
244+
"print(VideoDecoder(short_video_path, seek_mode=\"exact\").metadata)\n",
245+
"print(\"短视频元数据(approximate):\")\n",
246+
"print(VideoDecoder(short_video_path, seek_mode=\"approximate\").metadata)\n",
247+
"\n",
248+
"exact_decoder = VideoDecoder(short_video_path, seek_mode=\"exact\")\n",
249+
"approx_decoder = VideoDecoder(short_video_path, seek_mode=\"approximate\")\n",
250+
"for i in range(len(exact_decoder)):\n",
251+
" torch.testing.assert_close(\n",
252+
" exact_decoder.get_frame_at(i).data,\n",
253+
" approx_decoder.get_frame_at(i).data,\n",
254+
" atol=0, rtol=0,\n",
255+
" )\n",
256+
"print(\"该视频上,两种模式的帧寻址一致!\")\n"
257+
]
258+
},
259+
{
260+
"cell_type": "markdown",
261+
"metadata": {},
262+
"source": [
263+
"## 原理简述\n",
264+
"当 `seek_mode=\"exact\"` 时,解码器在初始化阶段会进行一次 \"扫描\":不解码整段文件,但处理整个文件以获得更精确的元数据(如时长),并构建帧与关键帧的内部索引。\n",
265+
"该索引可能比文件头中的更准确,从而提升寻址准确性。若不扫描,TorchCodec 仅依赖文件自身元数据,其准确性可能不佳。\n"
266+
]
267+
},
268+
{
269+
"cell_type": "markdown",
270+
"metadata": {},
271+
"source": [
272+
"## 选择建议\n",
273+
"- 若非常在意帧寻址的严格精确性,使用 `exact`。\n",
274+
"- 若为速度可牺牲部分寻址精度(如片段采样),使用 `approximate`。\n",
275+
"- 若视频无可变帧率且元数据正确,`approximate` 通常与 `exact` 一样准确但更快。\n"
276+
]
277+
},
278+
{
279+
"cell_type": "markdown",
280+
"metadata": {},
281+
"source": [
282+
"## 清理临时资源\n"
283+
]
284+
},
285+
{
286+
"cell_type": "code",
287+
"execution_count": null,
288+
"metadata": {},
289+
"outputs": [],
290+
"source": [
291+
"shutil.rmtree(temp_dir)\n"
292+
]
293+
}
294+
],
295+
"metadata": {
296+
"kernelspec": {
297+
"display_name": "py313",
298+
"language": "python",
299+
"name": "python3"
300+
},
301+
"language_info": {
302+
"codemirror_mode": {
303+
"name": "ipython",
304+
"version": 3
305+
},
306+
"file_extension": ".py",
307+
"mimetype": "text/x-python",
308+
"name": "python",
309+
"nbconvert_exporter": "python",
310+
"pygments_lexer": "ipython3",
311+
"version": "3.13.9"
312+
}
313+
},
314+
"nbformat": 4,
315+
"nbformat_minor": 5
316+
}

0 commit comments

Comments
 (0)