-
Notifications
You must be signed in to change notification settings - Fork 213
Llane/sdg ray docs #1347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Llane/sdg ray docs #1347
Changes from 10 commits
8d42be8
bccea95
abd2209
91f5f9a
9a29ce7
8b6f531
a4ae7a4
2794210
b237fcb
eea5799
777ea37
2891e8f
8a3e846
0086976
ad1b77f
6642691
3cd18a7
40eedfa
0a14d61
66ff0c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -12,214 +12,27 @@ modality: "universal" | |||||
|
|
||||||
| # NeMo Curator Release Notes: {{ current_release }} | ||||||
|
|
||||||
| This major release represents a fundamental architecture shift from [Dask](https://www.dask.org/) to [Ray](https://www.ray.io/), expanding NeMo Curator to support multimodal data curation with new [video](../../curate-video/index.md) and [audio](../../curate-audio/index.md) capabilities. This refactor enables unified backend processing, better heterogeneous computing support, and enhanced autoscaling for dynamic workloads. | ||||||
| ## Synthetic Data Generation | ||||||
|
Comment on lines
13
to
+15
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The removed introductory paragraph provided important context about the Dask-to-Ray architecture shift and migration guide references. Consider restoring this as the opening paragraph before the SDG section to help users understand the scope of v26.02. Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
||||||
|
|
||||||
| **Migrating from a previous version of NeMo Curator?** Refer to the {ref}`Migration Guide <migration-guide>` for step-by-step instructions and the {ref}`Migration FAQ <migration-faq>` for common questions. | ||||||
| New Ray-based synthetic data generation capabilities for creating and augmenting training data using LLMs: | ||||||
|
|
||||||
| ## Installation Updates | ||||||
| - **LLM Client Infrastructure**: OpenAI-compatible async/sync clients with automatic rate limiting, retry logic, and exponential backoff | ||||||
| - **Multilingual Q&A Generation**: Generate synthetic Q&A pairs across multiple languages using customizable prompts | ||||||
| - **NemotronCC Pipelines**: Advanced text transformation and knowledge extraction workflows: | ||||||
|
||||||
| - **NemotronCC Pipelines**: Advanced text transformation and knowledge extraction workflows: | |
| - **Nemotron-CC Pipelines**: Advanced text transformation and knowledge extraction workflows: |
prefer to refer to it as Nemotron-CC everywhere.
lbliii marked this conversation as resolved.
Show resolved
Hide resolved
lbliii marked this conversation as resolved.
Show resolved
Hide resolved
lbliii marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The release notes section was reduced from 231 lines to 44 lines, removing all the comprehensive v26.02 release information. The SDG section should be added to the existing release notes, not replace them.
Missing content includes:
- Installation updates (Docker CUDA 12.8.1, UV integration, PyPI extras for all modalities)
- New modalities (Video and Audio support with detailed features)
- Text and Image modality refactors
- Deduplication improvements
- Core architecture refactors (Pipeline/Stage redesigns)
- Tutorials updates
- Known limitations
Restore the original release notes content and add the SDG section as a new bullet point under the appropriate category (likely "New Features" or "Text Modality Updates").
lbliii marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code curation and math curation are some of the milestones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opening paragraph describing the Ray architecture shift and migration guide reference was removed. This important context helps users understand the scope of the 26.02 release. Consider restoring:
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!