Skip to content

Commit e844323

Browse files
authored
video doc improvement (#1124)
Signed-off-by: Ao Tang <[email protected]>
1 parent e484a4c commit e844323

File tree

4 files changed

+36
-30
lines changed

4 files changed

+36
-30
lines changed

docs/about/concepts/video/abstractions.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,15 @@ A pipeline orchestrates stages into an end-to-end workflow. Key characteristics:
3333

3434
## Stages
3535

36-
A stage represents a single step in your data curation workflow. For example, stages can:
37-
38-
- Download videos
39-
- Convert video formats
40-
- Split videos into clips
41-
- Generate embeddings
42-
- Calculate scores
36+
A stage represents a single step in your data curation workflow. Video stages are organized into several functional categories:
37+
38+
- **Input/Output**: Read video files and write processed outputs to storage ([Save & Export Documentation](video-save-export))
39+
- **Video Clipping**: Split videos into clips using fixed stride or scene-change detection ([Video Clipping Documentation](video-process-clipping))
40+
- **Frame Extraction**: Extract frames from videos or clips for analysis and embeddings ([Frame Extraction Documentation](video-process-frame-extraction))
41+
- **Embedding Generation**: Generate clip-level embeddings using InternVideo2 or Cosmos-Embed1 models ([Embeddings Documentation](video-process-embeddings))
42+
- **Filtering**: Filter clips based on motion analysis and aesthetic quality scores ([Filtering Documentation](video-process-filtering))
43+
- **Caption and Preview**: Generate captions and preview images from video clips ([Captions & Preview Documentation](video-process-captions-preview))
44+
- **Deduplication**: Remove near-duplicate clips using embedding-based clustering ([Duplicate Removal Documentation](video-process-dedup))
4345

4446
### Stage Architecture
4547

docs/curate-video/process-data/frame-extraction.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,8 @@ from nemo_curator.stages.video.clipping.video_frame_extraction import VideoFrame
117117

118118
frame_extractor = VideoFrameExtractionStage(
119119
decoder_mode="pynvc", # or "ffmpeg_gpu", "ffmpeg_cpu"
120+
output_hw=(27, 48), # (height, width) for frame extraction
121+
pyncv_batch_size=64, # batch size for PyNvCodec
120122
verbose=True,
121123
)
122124
```
@@ -139,8 +141,14 @@ frame_extractor = VideoFrameExtractionStage(
139141
- Shortcut that sets default FPS for specific purposes (such as embeddings). You can still pass `target_fps` to override.
140142
* - `target_res`
141143
- Output frame resolution `(height, width)`. Use `(-1, -1)` to keep original.
144+
* - `num_cpus`
145+
- Number of CPU cores for frame extraction. Default: `3`.
142146
* - `decoder_mode`
143147
- For full‑video extraction: `pynvc` (NVDEC), `ffmpeg_gpu`, or `ffmpeg_cpu`.
148+
* - `output_hw`
149+
- For full‑video extraction: `(height, width)` tuple for frame dimensions. Default: `(27, 48)`.
150+
* - `pyncv_batch_size`
151+
- For full‑video extraction: batch size for PyNvCodec processing. Default: `64`.
144152
```
145153

146154
### LCM Sampling for Several FPS Values

docs/curate-video/save-export.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ pipeline.add_stage(
3030
generate_embeddings=True,
3131
generate_previews=False,
3232
generate_captions=False,
33-
embedding_algorithm="internvideo2", # or "cosmos-embed1"
33+
embedding_algorithm="cosmos-embed1", # or "internvideo2"
3434
caption_models=["qwen"],
3535
enhanced_caption_models=["qwen_lm"],
3636
verbose=True,
@@ -69,7 +69,7 @@ pipeline.add_stage(
6969
- The stage includes captions in metadata when upstream stages provide them.
7070
* - `embedding_algorithm`
7171
- `str`
72-
- Accepted: `internvideo2` or `cosmos-embed1`.
72+
- Accepted: `cosmos-embed1` or `internvideo2`. Default: `cosmos-embed1`.
7373
* - `caption_models`
7474
- `list[str] | None`
7575
- Ordered caption models to emit. Use `[]` when not using captions.

docs/get-started/video.md

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -117,32 +117,28 @@ Embeddings convert each video clip into a numeric vector that captures visual an
117117

118118
You can choose between two embedding models:
119119

120-
- **Cosmos-Embed1 (default)**: Automatically downloaded to `MODEL_DIR` on first run; good general-purpose performance and lower VRAM usage.
121-
- **InternVideo2 (IV2)**: Open model that requires the IV2 checkpoint and BERT model files to be available locally; higher VRAM usage.
120+
- **Cosmos-Embed1 (default)**: Available in three variants—**cosmos-embed1-224p**, **cosmos-embed1-336p**, and **cosmos-embed1-448p**—which differ in input resolution and accuracy/VRAM tradeoff. All variants are automatically downloaded to `MODEL_DIR` on first run.
121+
- [cosmos-embed1-224p on Hugging Face](https://huggingface.co/nvidia/Cosmos-Embed1-224p)
122+
- [cosmos-embed1-336p on Hugging Face](https://huggingface.co/nvidia/Cosmos-Embed1-336p)
123+
- [cosmos-embed1-448p on Hugging Face](https://huggingface.co/nvidia/Cosmos-Embed1-448p)
124+
- **InternVideo2 (IV2)**: Open model that requires the IV2 checkpoint and BERT model files to be available locally; higher VRAM usage.
125+
- [InternVideo Official Github Page](https://github.com/OpenGVLab/InternVideo)
122126

123-
For this quickstart, we're going to set up support for **IV2**.
127+
For this quickstart, we're going to set up support for **Cosmos-Embed1-224p**.
124128

125-
### Prepare IV2 Model Weights
129+
### Prepare Model Weights
126130

127-
Complete the following steps when you set `--embedding-algorithm` to `internvideo2` or when you pre-stage models for offline use.
131+
For most use cases, you only need to create a model directory. The required model files will be downloaded automatically on first run.
128132

129-
1. Create a model directory.
133+
1. Create a model directory:
134+
```bash
135+
mkdir -p "$MODEL_DIR"
136+
```
130137
:::{tip}
131138
You can reuse the same `<MODEL_DIR>` across runs.
132139
:::
133-
2. Download the IV2 Checkpoint from the [OpenGVLab page](https://github.com/OpenGVLab) and accept the terms.
134-
3. Download the BERT model files for [`google-bert/bert-large-uncased`](https://huggingface.co/google-bert/bert-large-uncased).
135-
136-
The directory should resemble the following:
137-
138-
```text
139-
<MODEL_DIR>/
140-
OpenGVLab/InternVideo2-Stage2_1B-224p-f4/InternVideo2-stage2_1b-224p-f4.pt
141-
google-bert/bert-large-uncased/
142-
config.json
143-
tokenizer.json
144-
... (standard tokenizer files)
145-
```
140+
141+
2. No additional setup is required. The model will be downloaded automatically when first used.
146142

147143
## Set Up Data Directories
148144

@@ -169,7 +165,7 @@ python -m nemo_curator.examples.video.video_split_clip_example \
169165
--output-clip-path "$OUT_DIR" \
170166
--splitting-algorithm fixed_stride \
171167
--fixed-stride-split-duration 10.0 \
172-
--embedding-algorithm internvideo2 \
168+
--embedding-algorithm cosmos-embed1-224p \
173169
--transcode-encoder libopenh264 \
174170
--verbose
175171
```
@@ -196,7 +192,7 @@ The example script supports the following options:
196192
```
197193

198194
:::{tip}
199-
To use the default Cosmos-Embed1 instead, omit `--embedding-algorithm` or set `--embedding-algorithm cosmos-embed1-224p`.
195+
To use InternVideo2 instead, set `--embedding-algorithm internvideo2`.
200196
:::
201197

202198
## Next Steps

0 commit comments

Comments
 (0)