Skip to content

Commit 2ffc9bd

Browse files
arhamm1lbliii
andauthored
Update container-environments.md (#1262)
Signed-off-by: Arham Mehta <[email protected]> Co-authored-by: L.B. <[email protected]>
1 parent 0821304 commit 2ffc9bd

File tree

1 file changed

+37
-4
lines changed

1 file changed

+37
-4
lines changed

docs/reference/infrastructure/container-environments.md

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,46 @@ modality: "universal"
1212

1313
# Container Environments
1414

15-
This reference documents the default environments available in NeMo Curator containers and their configurations.
15+
Deploy NeMo Curator in containerized environments for reproducible, scalable data curation pipelines with pre-configured dependencies and optimized runtime settings.
1616

17-
(reference-infrastructure-container-environments-main)=
17+
## Overview
1818

19-
## Main Container Environment
19+
NeMo Curator provides official Docker containers with all dependencies pre-installed and optimized for production workloads. Containers offer:
2020

21-
The primary NeMo Curator container includes a uv-managed virtual environment with all necessary dependencies.
21+
- **Reproducible Environments**: Consistent software stack across development, testing, and production
22+
- **Simplified Deployment**: No manual dependency installation or environment configuration
23+
- **GPU Acceleration**: Pre-configured CUDA, cuDNN, and NVIDIA libraries for optimal performance
24+
- **Multi-Modal Support**: Built-in support for text, image, video, and audio curation
25+
- **Cloud-Ready**: Compatible with Kubernetes, Docker Swarm, and cloud container orchestries
26+
27+
**When to use containers:**
28+
- Production deployments requiring consistency and reliability
29+
- Multi-node cluster processing with identical environments
30+
- CI/CD pipelines for automated data curation workflows
31+
- Quick prototyping without local environment setup
32+
- GPU-accelerated processing in cloud environments
33+
34+
## Available Containers
35+
36+
### Main NeMo Curator Container
37+
38+
The primary container includes comprehensive support for all curation modalities:
39+
40+
**Container registry:** `nvcr.io/nvidia/nemo-curator:25.09`
41+
42+
**Supported modalities:**
43+
- ✅ Text curation (CPU/GPU)
44+
- ✅ Image curation (GPU required)
45+
- ✅ Video curation (GPU required, FFmpeg included)
46+
- ✅ Audio curation (GPU required for ASR)
47+
48+
**Pre-installed components:**
49+
- NeMo Curator with all optional dependencies (`[all]` extras)
50+
- CUDA 12.8.1 with cuDNN
51+
- Python 3.12 with uv package manager
52+
- FFmpeg 7+ with NVENC support (for video processing)
53+
- Ray, Dask, and distributed computing frameworks
54+
- NVIDIA optimized Python packages
2255

2356
(reference-infrastructure-container-environments-curator)=
2457

0 commit comments

Comments
 (0)