Skip to content

Commit 5798cd0

Browse files
committed
feedback
Signed-off-by: Lawrence Lane <llane@nvidia.com>
1 parent a2a340d commit 5798cd0

File tree

4 files changed

+11
-4
lines changed

4 files changed

+11
-4
lines changed

docs/about/release-notes/index.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ New API for tracking and analyzing pipeline execution:
116116

117117
## Bug Fixes
118118

119-
- Fixed fasttext predict call compatibility with numpy>2
119+
- Fixed fasttext predict call compatibility with numpy>2
120120
- Fixed broken NeMo Framework documentation links
121121
- Fixed MegatronTokenizerWriter to download only necessary tokenizer files
122122
- Fixed ID generator blocking issues for large-scale processing
@@ -147,7 +147,6 @@ New API for tracking and analyzing pipeline execution:
147147
- **Memory Management**: New guidance for handling CPU/GPU memory constraints
148148
- **AWS Integration**: Updated tutorials with correct AWS credentials setup
149149

150-
151150
---
152151

153152
## What's Next

docs/curate-video/process-data/dedup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ workflow = SemanticDeduplicationWorkflow(
5656
n_clusters=1000,
5757
id_field="id",
5858
embedding_field="embedding",
59-
embedding_dim=512, # 512 for InternVideo2, varies for Cosmos-Embed1
59+
embedding_dim=768, # Embedding dimension (768 for Cosmos-Embed1, varies by model)
6060
input_filetype="parquet",
6161
eps=0.1, # Similarity threshold: cosine_sim >= 1.0 - eps identifies duplicates
6262
ranking_strategy=RankingStrategy.metadata_based(

docs/get-started/image.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,16 @@ Here's a simple example to get started with NeMo Curator's image curation pipeli
120120
Image loading and decoding happens in CPU memory before GPU processing. If you encounter out-of-memory errors during the `ImageReaderStage`, reduce:
121121
- `batch_size`: Number of images per batch (reduce to 32-50 for systems with limited RAM)
122122
- `num_threads`: Parallel decoding threads (reduce to 4 for systems with limited RAM)
123+
- `num_cpus`: Ray Client CPU allocation (reduce to 8-16 for systems with limited RAM)
123124

124125
The example below uses conservative defaults suitable for most systems. For high-memory systems, you can increase these values for better performance.
126+
127+
To configure Ray with limited CPU resources:
128+
```python
129+
from nemo_curator.core.client import RayClient
130+
ray_client = RayClient(num_cpus=8) # Adjust based on available CPU cores
131+
ray_client.start()
132+
```
125133
:::
126134

127135
```python

docs/reference/infrastructure/container-environments.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ NeMo Curator provides official Docker containers with all dependencies pre-insta
3737

3838
The primary container includes comprehensive support for all curation modalities:
3939

40-
**Container registry:** `nvcr.io/nvidia/nemo-curator:26.02`
40+
**Container registry:** `nvcr.io/nvidia/nemo-curator:{{ container_version }}`
4141

4242
**Supported modalities:**
4343
- ✅ Text curation (CPU/GPU)

0 commit comments

Comments
 (0)