Skip to content

Commit 2f6460a

Browse files
committed
cleanup
Signed-off-by: Lawrence Lane <llane@nvidia.com>
1 parent 36f01c6 commit 2f6460a

File tree

1 file changed

+8
-20
lines changed

1 file changed

+8
-20
lines changed

docs/about/release-notes/migration-faq.md

Lines changed: 8 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -122,19 +122,6 @@ A set of models assigns a quality score (for example, 0–20), bucketed into hig
122122
Each task is a data class. You can add whatever statistics you need (input or output counts, tokens dropped, and so on) within stage logic for detailed reporting or logging.
123123
```
124124

125-
```{dropdown} Does the new deduplication feature support global deduplication (across all snapshots, not just incremental)?
126-
127-
Yes, NeMo Curator supports {ref}`global deduplication <text-process-data-dedup>` by processing multiple data sources together in a single pass. You can provide files from multiple snapshots (such as different Common Crawl releases) as input, and deduplication will identify duplicates across all of them.
128-
129-
Three GPU-accelerated approaches are available:
130-
131-
- **Exact deduplication**: MD5 hashing for identical documents (unlimited scale)
132-
- **Fuzzy deduplication**: MinHash with LSH for near-duplicates (petabyte-scale)
133-
- **Semantic deduplication**: Embedding-based similarity for meaning-based duplicates (terabyte-scale)
134-
135-
For comprehensive documentation, refer to {ref}`Deduplication Concepts <about-concepts-deduplication>`.
136-
```
137-
138125
---
139126

140127
## Fault Tolerance, Checkpointing, and Observability
@@ -168,11 +155,6 @@ Yes, by design. Add new stages or modify process functions to integrate custom l
168155
Contributions are strongly encouraged. Submit pull requests or join community discussions to help expand NeMo Curator's capabilities for diverse regions and languages.
169156
```
170157

171-
```{dropdown} How do I manage different dependencies for separate stages, especially if packaging as Docker images?
172-
173-
Each stage can specify its Conda environment, which must be present in the Docker image. Import your dependencies within stage logic to ensure proper isolation.
174-
```
175-
176158
---
177159

178160
## Deployment, Infrastructure, and Practicalities
@@ -217,7 +199,13 @@ Yes. NeMo Curator supports multiple data modalities including {ref}`text <gs-tex
217199

218200
```{dropdown} Where can I get support, ask questions, or contribute?
219201
220-
Support and discussion channels are available on Slack and [GitHub](https://github.com/NVIDIA-NeMo/Curator). If you build innovative filters or features for your locale, please engage and contribute back. Regular calls and community check-ins are offered.
202+
All support and community engagement happens on [GitHub](https://github.com/NVIDIA-NeMo/Curator). We encourage you to:
203+
204+
- **Open an issue** for bugs, feature requests, or questions
205+
- **Start a discussion** to share ideas or ask for guidance
206+
- **Submit a pull request** if you build innovative filters or features
207+
208+
Regular community calls and check-ins are also offered to connect with the team and other users.
221209
```
222210

223211
---
@@ -231,5 +219,5 @@ Support and discussion channels are available on Slack and [GitHub](https://gith
231219
```
232220

233221
```{tip}
234-
If you find something missing or want to share a best practice or feature, join the NeMo Curator community or submit an issue or pull request on GitHub.
222+
If you find something missing or want to share a best practice or feature, please [open an issue](https://github.com/NVIDIA-NeMo/Curator/issues) or [submit a pull request](https://github.com/NVIDIA-NeMo/Curator/pulls) on GitHub.
235223
```

0 commit comments

Comments
 (0)