xinetzone
diff --git a/‎.github/workflows/pages.yml‎
Lines changed: 1 addition & 0 deletions b/‎.github/workflows/pages.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/TorchCodec/audio/decoding.ipynb‎
Lines changed: 309 additions & 0 deletions b/‎doc/TorchCodec/audio/decoding.ipynb‎
Lines changed: 309 additions & 0 deletions
diff --git a/‎doc/TorchCodec/audio/encoding.ipynb‎
Lines changed: 260 additions & 0 deletions b/‎doc/TorchCodec/audio/encoding.ipynb‎
Lines changed: 260 additions & 0 deletions
diff --git a/‎doc/TorchCodec/audio/index.md‎
Lines changed: 6 additions & 0 deletions b/‎doc/TorchCodec/audio/index.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎doc/TorchCodec/index.md‎
Lines changed: 12 additions & 0 deletions b/‎doc/TorchCodec/index.md‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎doc/TorchCodec/install.md‎
Lines changed: 15 additions & 0 deletions b/‎doc/TorchCodec/install.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎doc/TorchCodec/video/.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎doc/TorchCodec/video/.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/TorchCodec/video/approximate-mode.ipynb‎
Lines changed: 316 additions & 0 deletions b/‎doc/TorchCodec/video/approximate-mode.ipynb‎
Lines changed: 316 additions & 0 deletions
@@ -46,6 +46,7 @@ jobs:
           apt-get update && sudo apt-get upgrade
           pip install --upgrade pip
           pip install -ve .[doc,flows,dev]
+          conda install -c conda-forge torchcodec
       - name: 🔧 Build HTML
         run: |
           invoke doc
 
@@ -0,0 +1,6 @@
+# 音频处理
+
+```{toctree}
+encoding
+decoding
+```
@@ -0,0 +1,12 @@
+# TorchCodec 教程
+
+TorchCodec 是一个 Python 库，用于在 CPU 和 CUDA GPU 上将视频和音频数据解码为 PyTorch 张量。它的目标是快速、易于使用，并很好地集成到 PyTorch 生态系统中。如果您想使用 PyTorch 在视频和音频上训练 ML 模型，TorchCodec 就是将这些模型转换为数据的方式：
+- 镜像 Python 和 PyTorch 约定的 Pythonic API。
+- 依靠 [FFmpeg](https://www.ffmpeg.org/) 进行解码/编码。TorchCodec 使用您已经安装的 FFmpeg 版本。FMPEG 是一个成熟的库，在大多数系统上都具有广泛的覆盖范围。然而，它并不容易使用。TorchCodec 抽象了 FFmpeg 的复杂性，以确保正确有效地使用它。
+- 将数据作为 PyTorch 张量返回，随时可以输入到 PyTorch 变换中或直接用于训练模型。
+
+```{toctree}
+install
+audio/index
+video/index
+```
@@ -0,0 +1,15 @@
+# 安装
+
+如果尚未安装 FFmpeg，或者你需要更新的版本，安装它的简单方法是使用 `conda`：
+
+```bash
+conda install "ffmpeg"
+# 或者
+conda install "ffmpeg" -c conda-forge
+```
+
+安装 TorchCodec：
+
+```bash
+conda install -c conda-forge torchcodec
+```
@@ -0,0 +1 @@
+*.mp4
@@ -0,0 +1,316 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 精确与近似寻址模式：性能与准确性对比\n",
+    "\n",
+    "本示例介绍 :class:`torchcodec.decoders.VideoDecoder` 的 `seek_mode` 参数。\n",
+    "该参数在解码器创建速度与帧寻址准确性之间做权衡（例如在近似模式下，请求第 `i` 帧不一定返回第 `i` 帧）。\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 准备：下载短视频并生成长视频\n",
+    "我们从网络下载一个约 13 秒的短视频，并用 `ffmpeg` 将其循环 100 次，得到一个约 20 分钟的长视频。\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "ceb2a897",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "短视频时长: 13.8 秒\n",
+      "长视频时长: 23.0 分钟\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "import httpx\n",
+    "import tempfile\n",
+    "from pathlib import Path\n",
+    "import shutil\n",
+    "import subprocess\n",
+    "from time import perf_counter_ns\n",
+    "\n",
+    "# 视频来源: https://www.pexels.com/video/dog-eating-854132/  许可: CC0  作者: Coverr\n",
+    "url = \"https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4\"\n",
+    "headers = {\"User-Agent\": \"\"}\n",
+    "\n",
+    "temp_dir = tempfile.mkdtemp()\n",
+    "short_video_path = Path(temp_dir) / \"short_video.mp4\"\n",
+    "with httpx.stream(\"GET\", url, headers=headers, follow_redirects=True) as r:\n",
+    "    if r.status_code != 200:\n",
+    "        raise RuntimeError(f\"Failed to download video. status_code = {r.status_code}.\")\n",
+    "    with open(short_video_path, 'wb') as f:\n",
+    "        for chunk in r.iter_bytes():\n",
+    "            if chunk:\n",
+    "                f.write(chunk)\n",
+    "\n",
+    "long_video_path = Path(temp_dir) / \"long_video.mp4\"\n",
+    "ffmpeg_command = [\n",
+    "    \"ffmpeg\",\n",
+    "    \"-stream_loop\", \"99\",  # 重复 100 次\n",
+    "    \"-i\", f\"{short_video_path}\",\n",
+    "    \"-c\", \"copy\",\n",
+    "    f\"{long_video_path}\"\n",
+    "]\n",
+    "subprocess.run(ffmpeg_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n",
+    "\n",
+    "from torchcodec.decoders import VideoDecoder\n",
+    "print(f\"短视频时长: {VideoDecoder(short_video_path).metadata.duration_seconds} 秒\")\n",
+    "print(f\"长视频时长: {VideoDecoder(long_video_path).metadata.duration_seconds / 60} 分钟\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 性能：解码器创建耗时\n",
+    "`seek_mode` 最直接影响的是 :class:`torchcodec.decoders.VideoDecoder` 的\n",
+    "创建耗时；视频越长，近似模式的收益越明显。\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "在短视频上创建 seek_mode='exact' 的解码器:\n",
+      "med = 4.58ms +- 0.44\n",
+      "在短视频上创建 seek_mode='approximate' 的解码器:\n",
+      "med = 4.13ms +- 0.45\n",
+      "\n",
+      "在长视频上创建 seek_mode='exact' 的解码器:\n",
+      "med = 49.23ms +- 2.61\n",
+      "在长视频上创建 seek_mode='approximate' 的解码器:\n",
+      "med = 5.31ms +- 0.86\n"
+     ]
+    }
+   ],
+   "source": [
+    "def bench(f, average_over=50, warmup=2, **f_kwargs):\n",
+    "    for _ in range(warmup):\n",
+    "        f(**f_kwargs)\n",
+    "    times = []\n",
+    "    for _ in range(average_over):\n",
+    "        start = perf_counter_ns()\n",
+    "        f(**f_kwargs)\n",
+    "        end = perf_counter_ns()\n",
+    "        times.append(end - start)\n",
+    "    times = torch.tensor(times) * 1e-6\n",
+    "    std = times.std().item()\n",
+    "    med = times.median().item()\n",
+    "    print(f\"{med = :.2f}ms +- {std:.2f}\")\n",
+    "\n",
+    "print(\"在短视频上创建 seek_mode='exact' 的解码器:\")\n",
+    "bench(VideoDecoder, source=short_video_path, seek_mode=\"exact\")\n",
+    "print(\"在短视频上创建 seek_mode='approximate' 的解码器:\")\n",
+    "bench(VideoDecoder, source=short_video_path, seek_mode=\"approximate\")\n",
+    "print()\n",
+    "print(\"在长视频上创建 seek_mode='exact' 的解码器:\")\n",
+    "bench(VideoDecoder, source=long_video_path, seek_mode=\"exact\")\n",
+    "print(\"在长视频上创建 seek_mode='approximate' 的解码器:\")\n",
+    "bench(VideoDecoder, source=long_video_path, seek_mode=\"approximate\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 性能：帧解码与片段采样\n",
+    "严格来说，`seek_mode` 只影响解码器创建本身；并不直接影响解码或采样。\n",
+    "但实际流程往往为每个视频先创建解码器，因此它会间接影响总体耗时。\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "使用 seek_mode='exact' 进行片段采样:\n",
+      "med = 131.16ms +- 16.07\n",
+      "使用 seek_mode='approximate' 进行片段采样:\n",
+      "med = 88.62ms +- 23.35\n"
+     ]
+    }
+   ],
+   "source": [
+    "from torchcodec import samplers\n",
+    "\n",
+    "def sample_clips(seek_mode):\n",
+    "    return samplers.clips_at_random_indices(\n",
+    "        decoder=VideoDecoder(\n",
+    "            source=long_video_path,\n",
+    "            seek_mode=seek_mode\n",
+    "        ),\n",
+    "        num_clips=5,\n",
+    "        num_frames_per_clip=2,\n",
+    "    )\n",
+    "\n",
+    "print(\"使用 seek_mode='exact' 进行片段采样:\")\n",
+    "bench(sample_clips, seek_mode=\"exact\")\n",
+    "print(\"使用 seek_mode='approximate' 进行片段采样:\")\n",
+    "bench(sample_clips, seek_mode=\"approximate\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 准确性：元数据与帧获取\n",
+    "`seek_mode=\"approximate\"` 能显著加速创建，但代价是寻址不如精确模式准确，\n",
+    "也可能影响元数据的精确性。很多情况下两者没有差异，此时近似模式是 \"净收益\"。\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "短视频元数据（exact）:\n",
+      "VideoStreamMetadata:\n",
+      "  duration_seconds_from_header: 13.8\n",
+      "  begin_stream_seconds_from_header: 0.0\n",
+      "  bit_rate: 505790.0\n",
+      "  codec: h264\n",
+      "  stream_index: 0\n",
+      "  begin_stream_seconds_from_content: 0.0\n",
+      "  end_stream_seconds_from_content: 13.8\n",
+      "  width: 640\n",
+      "  height: 360\n",
+      "  num_frames_from_header: 345\n",
+      "  num_frames_from_content: 345\n",
+      "  average_fps_from_header: 25.0\n",
+      "  pixel_aspect_ratio: 1\n",
+      "  duration_seconds: 13.8\n",
+      "  begin_stream_seconds: 0.0\n",
+      "  end_stream_seconds: 13.8\n",
+      "  num_frames: 345\n",
+      "  average_fps: 25.0\n",
+      "\n",
+      "短视频元数据（approximate）:\n",
+      "VideoStreamMetadata:\n",
+      "  duration_seconds_from_header: 13.8\n",
+      "  begin_stream_seconds_from_header: 0.0\n",
+      "  bit_rate: 505790.0\n",
+      "  codec: h264\n",
+      "  stream_index: 0\n",
+      "  begin_stream_seconds_from_content: None\n",
+      "  end_stream_seconds_from_content: None\n",
+      "  width: 640\n",
+      "  height: 360\n",
+      "  num_frames_from_header: 345\n",
+      "  num_frames_from_content: None\n",
+      "  average_fps_from_header: 25.0\n",
+      "  pixel_aspect_ratio: 1\n",
+      "  duration_seconds: 13.8\n",
+      "  begin_stream_seconds: 0\n",
+      "  end_stream_seconds: 13.8\n",
+      "  num_frames: 345\n",
+      "  average_fps: 25.0\n",
+      "\n",
+      "该视频上，两种模式的帧寻址一致！\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"短视频元数据（exact）:\")\n",
+    "print(VideoDecoder(short_video_path, seek_mode=\"exact\").metadata)\n",
+    "print(\"短视频元数据（approximate）:\")\n",
+    "print(VideoDecoder(short_video_path, seek_mode=\"approximate\").metadata)\n",
+    "\n",
+    "exact_decoder = VideoDecoder(short_video_path, seek_mode=\"exact\")\n",
+    "approx_decoder = VideoDecoder(short_video_path, seek_mode=\"approximate\")\n",
+    "for i in range(len(exact_decoder)):\n",
+    "    torch.testing.assert_close(\n",
+    "        exact_decoder.get_frame_at(i).data,\n",
+    "        approx_decoder.get_frame_at(i).data,\n",
+    "        atol=0, rtol=0,\n",
+    "    )\n",
+    "print(\"该视频上，两种模式的帧寻址一致！\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 原理简述\n",
+    "当 `seek_mode=\"exact\"` 时，解码器在初始化阶段会进行一次 \"扫描\"：不解码整段文件，但处理整个文件以获得更精确的元数据（如时长），并构建帧与关键帧的内部索引。\n",
+    "该索引可能比文件头中的更准确，从而提升寻址准确性。若不扫描，TorchCodec 仅依赖文件自身元数据，其准确性可能不佳。\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 选择建议\n",
+    "- 若非常在意帧寻址的严格精确性，使用 `exact`。\n",
+    "- 若为速度可牺牲部分寻址精度（如片段采样），使用 `approximate`。\n",
+    "- 若视频无可变帧率且元数据正确，`approximate` 通常与 `exact` 一样准确但更快。\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 清理临时资源\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "shutil.rmtree(temp_dir)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "py313",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}