From 8d42be888e3efcc3159eed8e4d41b958b75ec458 Mon Sep 17 00:00:00 2001
From: Lawrence Lane <llane@nvidia.com>
Date: Fri, 2 Jan 2026 10:25:00 -0500
Subject: [PATCH 1/5] sdg ray docs init

Signed-off-by: Lawrence Lane <llane@nvidia.com>
---
 docs/about/release-notes/index.md             |  16 +-
 docs/curate-text/index.md                     |  11 +
 docs/curate-text/synthetic/index.md           | 156 +++++++
 docs/curate-text/synthetic/llm-client.md      | 301 ++++++++++++++
 docs/curate-text/synthetic/multilingual-qa.md | 299 +++++++++++++
 .../synthetic/nemotron-cc/index.md            | 280 +++++++++++++
 .../synthetic/nemotron-cc/tasks.md            | 393 ++++++++++++++++++
 tutorials/synthetic/README.md                 | 116 +++++-
 8 files changed, 1560 insertions(+), 12 deletions(-)
 create mode 100644 docs/curate-text/synthetic/index.md
 create mode 100644 docs/curate-text/synthetic/llm-client.md
 create mode 100644 docs/curate-text/synthetic/multilingual-qa.md
 create mode 100644 docs/curate-text/synthetic/nemotron-cc/index.md
 create mode 100644 docs/curate-text/synthetic/nemotron-cc/tasks.md

diff --git a/docs/about/release-notes/index.md b/docs/about/release-notes/index.md
index 5dd57edfc2..7492c9e141 100644
--- a/docs/about/release-notes/index.md
+++ b/docs/about/release-notes/index.md
@@ -190,13 +190,27 @@ graph LR
 
 For all tutorial content, refer to the [tutorials directory](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials) in the NeMo Curator GitHub repository.
 
+## Synthetic Data Generation
+
+New Ray-based synthetic data generation capabilities for creating and augmenting training data using LLMs:
+
+- **LLM Client Infrastructure**: OpenAI-compatible async/sync clients with automatic rate limiting, retry logic, and exponential backoff
+- **Multilingual Q&A Generation**: Generate synthetic Q&A pairs across multiple languages using customizable prompts
+- **NemotronCC Pipelines**: Advanced text transformation and knowledge extraction workflows:
+  - **Wikipedia Paraphrasing**: Improve low-quality text by rewriting in Wikipedia-style prose
+  - **Diverse QA**: Generate diverse question-answer pairs for reading comprehension training
+  - **Distill**: Create condensed, information-dense paraphrases preserving key concepts
+  - **Extract Knowledge**: Extract factual content as textbook-style passages
+  - **Knowledge List**: Extract structured fact lists from documents
+
+Learn more in the [Synthetic Data Generation documentation](../../curate-text/synthetic/index.md).
+
 ## Known Limitations
 
 > (Pending Refactor in Future Release)
 
 ### Generation
 
-- **Synthetic data generation**: Synthetic text generation features are being refactored for Ray compatibility
 - **Hard negative mining**: Retrieval-based data generation workflows under development
 
 ### PII
diff --git a/docs/curate-text/index.md b/docs/curate-text/index.md
index f8c0aa9576..c9a8c46275 100644
--- a/docs/curate-text/index.md
+++ b/docs/curate-text/index.md
@@ -191,6 +191,17 @@ Domain-specific processing for code and advanced curation tasks
 {bdg-secondary}`code-processing`
 :::
 
+:::{grid-item-card} {octicon}`sparkles;1.5em;sd-mr-1` Synthetic Data Generation
+:link: synthetic/index
+:link-type: doc
+Generate and augment training data using LLMs
++++
+{bdg-secondary}`llm`
+{bdg-secondary}`augmentation`
+{bdg-secondary}`multilingual`
+{bdg-secondary}`nemotron-cc`
+:::
+
 ::::
 
 
diff --git a/docs/curate-text/synthetic/index.md b/docs/curate-text/synthetic/index.md
new file mode 100644
index 0000000000..7112b0288d
--- /dev/null
+++ b/docs/curate-text/synthetic/index.md
@@ -0,0 +1,156 @@
+---
+description: "Generate and augment training data using LLMs with NeMo Curator's synthetic data generation pipeline"
+categories: ["workflows"]
+tags: ["synthetic-data", "llm", "generation", "augmentation", "multilingual"]
+personas: ["data-scientist-focused", "mle-focused"]
+difficulty: "intermediate"
+content_type: "workflow"
+modality: "text-only"
+---
+
+(synthetic-data-overview)=
+
+# Synthetic Data Generation
+
+NeMo Curator provides synthetic data generation (SDG) capabilities for creating and augmenting training data using Large Language Models (LLMs). These pipelines integrate with OpenAI-compatible APIs, enabling you to use NVIDIA NIM endpoints, local vLLM servers, or other inference providers.
+
+## Use Cases
+
+- **Data Augmentation**: Expand limited datasets by generating diverse variations
+- **Multilingual Generation**: Create Q&A pairs and text in multiple languages
+- **Knowledge Extraction**: Convert raw text into structured knowledge formats
+- **Quality Improvement**: Paraphrase low-quality text into higher-quality Wikipedia-style prose
+- **Training Data Creation**: Generate instruction-following data for model fine-tuning
+
+## Core Concepts
+
+Synthetic data generation in NeMo Curator operates in two primary modes:
+
+### Generation Mode
+
+Create new data from scratch without requiring input documents. The `QAMultilingualSyntheticStage` demonstrates this pattern—it generates Q&A pairs based on a prompt template without needing seed documents.
+
+### Transformation Mode
+
+Improve or restructure existing data using LLM capabilities. The NemotronCC stages exemplify this approach, taking input documents and producing:
+
+- Paraphrased text in Wikipedia style
+- Diverse Q&A pairs derived from document content
+- Condensed knowledge distillations
+- Extracted factual content
+
+## Architecture
+
+The following diagram shows how SDG pipelines process data through preprocessing, LLM generation, and postprocessing stages:
+
+```{mermaid}
+flowchart LR
+    A["Input Documents<br/>(Parquet/JSONL)"] --> B["Preprocessing<br/>(Tokenization,<br/>Segmentation)"]
+    B --> C["LLM Generation<br/>(OpenAI-compatible)"]
+    C --> D["Postprocessing<br/>(Cleanup, Filtering)"]
+    D --> E["Output Dataset<br/>(Parquet/JSONL)"]
+    
+    F["LLM Client<br/>(NVIDIA API,<br/>vLLM, TGI)"] -.->|"API Calls"| C
+    
+    classDef stage fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000
+    classDef infra fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000
+    classDef output fill:#e8f5e9,stroke:#2e7d32,stroke-width:3px,color:#000
+    
+    class A,B,C,D stage
+    class E output
+    class F infra
+```
+
+## Prerequisites
+
+Before using synthetic data generation, ensure you have:
+
+1. **NVIDIA API Key** (for cloud endpoints)
+   - Obtain from [NVIDIA Build](https://build.nvidia.com/settings/api-keys)
+   - Set as environment variable: `export NVIDIA_API_KEY="your-key"`
+
+2. **NeMo Curator with text extras**
+
+   ```bash
+   uv pip install --extra-index-url https://pypi.nvidia.com nemo-curator[text_cuda12]
+   ```
+
+3. **Additional dependencies** (for NemotronCC pipelines)
+
+   ```bash
+   pip install transformers  # For tokenizer support
+   ```
+
+## Available SDG Stages
+
+```{list-table} Synthetic Data Generation Stages
+:header-rows: 1
+:widths: 30 40 30
+
+* - Stage
+  - Purpose
+  - Input Type
+* - `QAMultilingualSyntheticStage`
+  - Generate multilingual Q&A pairs
+  - Empty (generates from scratch)
+* - `WikipediaParaphrasingStage`
+  - Rewrite text as Wikipedia-style prose
+  - Document text
+* - `DiverseQAStage`
+  - Generate diverse Q&A pairs from documents
+  - Document text
+* - `DistillStage`
+  - Create condensed, information-dense paraphrases
+  - Document text
+* - `ExtractKnowledgeStage`
+  - Extract knowledge as textbook-style passages
+  - Document text
+* - `KnowledgeListStage`
+  - Extract structured fact lists
+  - Document text
+```
+
+---
+
+## Getting Started
+
+::::{grid} 1 1 2 2
+:gutter: 2
+
+:::{grid-item-card} {octicon}`plug;1.5em;sd-mr-1` LLM Client Setup
+:link: llm-client
+:link-type: doc
+Configure OpenAI-compatible clients for NVIDIA APIs and custom endpoints
++++
+{bdg-secondary}`configuration`
+{bdg-secondary}`performance`
+:::
+
+:::{grid-item-card} {octicon}`globe;1.5em;sd-mr-1` Multilingual Q&A Generation
+:link: multilingual-qa
+:link-type: doc
+Generate synthetic Q&A pairs across multiple languages
++++
+{bdg-secondary}`quickstart`
+{bdg-secondary}`tutorial`
+:::
+
+:::{grid-item-card} {octicon}`rocket;1.5em;sd-mr-1` NemotronCC Pipelines
+:link: nemotron-cc/index
+:link-type: doc
+Advanced text transformation and knowledge extraction workflows
++++
+{bdg-secondary}`advanced`
+{bdg-secondary}`paraphrasing`
+:::
+
+::::
+
+```{toctree}
+:hidden:
+:maxdepth: 2
+
+llm-client
+multilingual-qa
+nemotron-cc/index
+```
diff --git a/docs/curate-text/synthetic/llm-client.md b/docs/curate-text/synthetic/llm-client.md
new file mode 100644
index 0000000000..1b517e8230
--- /dev/null
+++ b/docs/curate-text/synthetic/llm-client.md
@@ -0,0 +1,301 @@
+---
+description: "Configure LLM clients for synthetic data generation with NVIDIA APIs or custom endpoints"
+categories: ["how-to-guides"]
+tags: ["llm-client", "openai", "nvidia-api", "configuration"]
+personas: ["data-scientist-focused", "mle-focused"]
+difficulty: "beginner"
+content_type: "how-to"
+modality: "text-only"
+---
+
+(synthetic-llm-client)=
+# LLM Client Configuration
+
+NeMo Curator's synthetic data generation uses OpenAI-compatible clients to communicate with LLM inference servers. This guide covers client configuration, performance tuning, and integration with various endpoints.
+
+## Overview
+
+Two client types are available:
+
+- **`AsyncOpenAIClient`**: Recommended for high-throughput batch processing with concurrent requests
+- **`OpenAIClient`**: Synchronous client for simpler use cases or debugging
+
+For most SDG workloads, use `AsyncOpenAIClient` to maximize throughput.
+
+## Basic Configuration
+
+### NVIDIA API Endpoints
+
+```python
+from nemo_curator.models.client.openai_client import AsyncOpenAIClient
+
+client = AsyncOpenAIClient(
+    api_key="your-nvidia-api-key",  # Or use NVIDIA_API_KEY env var
+    base_url="https://integrate.api.nvidia.com/v1",
+    max_concurrent_requests=5,
+)
+```
+
+### Environment Variables
+
+Set your API key as an environment variable to avoid hardcoding credentials:
+
+```bash
+export NVIDIA_API_KEY="nvapi-..."
+```
+
+The client automatically uses `NVIDIA_API_KEY` or `OPENAI_API_KEY` if not explicitly provided.
+
+## Generation Parameters
+
+Configure LLM generation behavior using `GenerationConfig`:
+
+```python
+from nemo_curator.models.client.llm_client import GenerationConfig
+
+config = GenerationConfig(
+    max_tokens=2048,
+    temperature=0.7,
+    top_p=0.95,
+    seed=42,  # For reproducibility
+)
+```
+
+```{list-table} Generation Parameters
+:header-rows: 1
+:widths: 20 15 15 50
+
+* - Parameter
+  - Type
+  - Default
+  - Description
+* - `max_tokens`
+  - int
+  - 2048
+  - Maximum tokens to generate per request
+* - `temperature`
+  - float
+  - 0.0
+  - Sampling temperature (0.0-2.0). Higher values increase randomness
+* - `top_p`
+  - float
+  - 0.95
+  - Nucleus sampling parameter (0.0-1.0)
+* - `top_k`
+  - int
+  - None
+  - Top-k sampling (if supported by the endpoint)
+* - `seed`
+  - int
+  - 0
+  - Random seed for reproducibility
+* - `stop`
+  - str/list
+  - None
+  - Stop sequences to end generation
+* - `stream`
+  - bool
+  - False
+  - Enable streaming (not recommended for batch processing)
+* - `n`
+  - int
+  - 1
+  - Number of completions to generate per request
+```
+
+## Performance Tuning
+
+### Concurrency vs. Parallelism
+
+The `max_concurrent_requests` parameter controls how many API requests the client can have in-flight simultaneously. This interacts with Ray's distributed workers:
+
+- **Client-level concurrency**: `max_concurrent_requests` limits concurrent API calls per worker
+- **Worker-level parallelism**: Ray distributes tasks across multiple workers
+
+```python
+# For NVIDIA API endpoints with rate limits
+client = AsyncOpenAIClient(
+    base_url="https://integrate.api.nvidia.com/v1",
+    max_concurrent_requests=3,  # Conservative for cloud APIs
+)
+
+# For local vLLM server with more capacity
+client = AsyncOpenAIClient(
+    base_url="http://localhost:8000/v1",
+    max_concurrent_requests=16,  # Higher for local deployment
+)
+```
+
+### Optimal Settings
+
+```{list-table} Recommended Concurrency Settings
+:header-rows: 1
+:widths: 30 25 45
+
+* - Endpoint Type
+  - Recommended Setting
+  - Notes
+* - NVIDIA API (cloud)
+  - 3-5
+  - Respects rate limits; increase gradually
+* - Local vLLM
+  - 8-32
+  - Depends on GPU memory and model size
+* - Local TGI
+  - 8-16
+  - Adjust based on server configuration
+```
+
+### Retry Configuration
+
+The client includes automatic retry with exponential backoff for transient errors:
+
+```python
+client = AsyncOpenAIClient(
+    base_url="https://integrate.api.nvidia.com/v1",
+    max_retries=3,        # Number of retry attempts
+    base_delay=1.0,       # Base delay in seconds
+    timeout=120,          # Request timeout
+)
+```
+
+The retry logic handles:
+- **Rate limit errors (429)**: Automatic backoff with jitter
+- **Connection errors**: Retry with exponential delay
+- **Transient failures**: Configurable retry attempts
+
+## Using Custom Endpoints
+
+````{tab-set}
+
+```{tab-item} Local vLLM Server
+
+Deploy a local vLLM server and configure the client:
+
+**Start vLLM server:**
+```bash
+vllm serve meta-llama/Llama-3.3-70B-Instruct \
+    --host 0.0.0.0 \
+    --port 8000 \
+    --tensor-parallel-size 4
+```
+
+**Configure client:**
+```python
+client = AsyncOpenAIClient(
+    base_url="http://localhost:8000/v1",
+    api_key="not-needed",  # vLLM doesn't require API key by default
+    max_concurrent_requests=16,
+    timeout=300,  # Longer timeout for large models
+)
+```
+```
+
+```{tab-item} Text Generation Inference (TGI)
+
+Deploy a TGI server and configure the client:
+
+**Start TGI server:**
+```bash
+docker run --gpus all -p 8080:80 \
+    ghcr.io/huggingface/text-generation-inference:latest \
+    --model-id meta-llama/Llama-3.3-70B-Instruct
+```
+
+**Configure client:**
+```python
+client = AsyncOpenAIClient(
+    base_url="http://localhost:8080/v1",
+    api_key="not-needed",
+    max_concurrent_requests=8,
+)
+```
+```
+
+```{tab-item} OpenAI API
+
+Use the official OpenAI API:
+
+```python
+client = AsyncOpenAIClient(
+    base_url="https://api.openai.com/v1",
+    api_key="sk-...",  # Or set OPENAI_API_KEY env var
+    max_concurrent_requests=5,
+)
+```
+```
+
+````
+
+## Complete Example
+
+```python
+import os
+from nemo_curator.models.client.openai_client import AsyncOpenAIClient
+from nemo_curator.models.client.llm_client import GenerationConfig
+from nemo_curator.pipeline import Pipeline
+from nemo_curator.stages.synthetic.qa_multilingual_synthetic import QAMultilingualSyntheticStage
+
+# Configure client
+client = AsyncOpenAIClient(
+    api_key=os.environ.get("NVIDIA_API_KEY"),
+    base_url="https://integrate.api.nvidia.com/v1",
+    max_concurrent_requests=5,
+    max_retries=3,
+    base_delay=1.0,
+)
+
+# Configure generation
+config = GenerationConfig(
+    temperature=0.9,
+    top_p=0.95,
+    max_tokens=2048,
+)
+
+# Use in a pipeline stage
+pipeline = Pipeline(name="sdg_example")
+pipeline.add_stage(
+    QAMultilingualSyntheticStage(
+        prompt="Generate a Q&A pair about science in {language}.",
+        languages=["English", "French", "German"],
+        client=client,
+        model_name="meta/llama-3.3-70b-instruct",
+        num_samples=100,
+        generation_config=config,
+    )
+)
+```
+
+## Troubleshooting
+
+### Rate Limit Errors
+
+If you encounter frequent 429 errors:
+1. Reduce `max_concurrent_requests`
+2. Increase `base_delay` for longer backoff
+3. Consider using a local deployment for high-volume workloads
+
+### Connection Timeouts
+
+For large models or slow networks:
+```python
+client = AsyncOpenAIClient(
+    base_url="...",
+    timeout=300,  # Increase from default 120 seconds
+)
+```
+
+### Local Server Issues
+
+If experiencing connection errors with local servers:
+- Check server resource utilization (GPU memory, CPU)
+- Reduce concurrent requests
+- Verify the server is running and accessible
+
+---
+
+## Next Steps
+
+- {ref}`multilingual-qa-tutorial`: Generate multilingual Q&A pairs
+- {ref}`nemotron-cc-overview`: Advanced text transformation pipelines
+
diff --git a/docs/curate-text/synthetic/multilingual-qa.md b/docs/curate-text/synthetic/multilingual-qa.md
new file mode 100644
index 0000000000..8417d63b34
--- /dev/null
+++ b/docs/curate-text/synthetic/multilingual-qa.md
@@ -0,0 +1,299 @@
+---
+description: "Generate multilingual Q&A pairs using LLMs with NeMo Curator's synthetic data pipeline"
+categories: ["tutorials"]
+tags: ["multilingual", "qa-generation", "synthetic-data", "quickstart"]
+personas: ["data-scientist-focused", "mle-focused"]
+difficulty: "beginner"
+content_type: "tutorial"
+modality: "text-only"
+---
+
+(multilingual-qa-tutorial)=
+# Generate Multilingual Q&A Data
+
+This tutorial shows how to generate synthetic Q&A pairs across multiple languages using NeMo Curator's `QAMultilingualSyntheticStage`. You'll learn to configure an LLM client, create a generation pipeline, and optionally filter the output.
+
+**Time to complete**: ~15 minutes
+
+## What You'll Build
+
+A pipeline that:
+1. Generates Q&A pairs in multiple languages using an LLM
+2. Optionally filters results by language
+3. Writes output to JSONL format
+
+## Prerequisites
+
+- **NVIDIA API Key**: Obtain from [NVIDIA Build](https://build.nvidia.com/settings/api-keys)
+- **NeMo Curator**: Installed with text extras
+
+```bash
+export NVIDIA_API_KEY="nvapi-..."
+```
+
+## Quick Start
+
+```python
+import os
+from nemo_curator.core.client import RayClient
+from nemo_curator.models.client.openai_client import AsyncOpenAIClient
+from nemo_curator.models.client.llm_client import GenerationConfig
+from nemo_curator.pipeline import Pipeline
+from nemo_curator.stages.synthetic.qa_multilingual_synthetic import QAMultilingualSyntheticStage
+from nemo_curator.stages.text.io.writer.jsonl import JsonlWriter
+
+# Initialize Ray
+client = RayClient(include_dashboard=False)
+client.start()
+
+# Create LLM client
+llm_client = AsyncOpenAIClient(
+    api_key=os.environ["NVIDIA_API_KEY"],
+    base_url="https://integrate.api.nvidia.com/v1",
+    max_concurrent_requests=5,
+)
+
+# Create pipeline
+pipeline = Pipeline(name="multilingual_qa")
+
+# Add synthetic generation stage
+pipeline.add_stage(
+    QAMultilingualSyntheticStage(
+        prompt="Generate a Q&A pair about science in {language}.",
+        languages=["English", "French", "German", "Spanish"],
+        client=llm_client,
+        model_name="meta/llama-3.3-70b-instruct",
+        num_samples=50,
+        generation_config=GenerationConfig(temperature=0.9),
+    )
+)
+
+# Write output
+pipeline.add_stage(JsonlWriter(path="./synthetic_qa/"))
+
+# Run pipeline
+results = pipeline.run()
+
+client.stop()
+```
+
+## Step-by-Step Guide
+
+### Step 1: Configure the LLM Client
+
+The `AsyncOpenAIClient` enables concurrent API requests for efficient batch generation:
+
+```python
+from nemo_curator.models.client.openai_client import AsyncOpenAIClient
+from nemo_curator.models.client.llm_client import GenerationConfig
+
+llm_client = AsyncOpenAIClient(
+    api_key=os.environ["NVIDIA_API_KEY"],
+    base_url="https://integrate.api.nvidia.com/v1",
+    max_concurrent_requests=5,  # Adjust based on rate limits
+    max_retries=3,              # Retry on transient failures
+    base_delay=1.0,             # Backoff delay in seconds
+)
+
+# Configure generation parameters
+generation_config = GenerationConfig(
+    temperature=0.9,   # Higher for more diverse outputs
+    top_p=0.95,
+    max_tokens=2048,
+    seed=None,         # None for non-deterministic generation
+)
+```
+
+### Step 2: Define the Prompt Template
+
+The prompt template must include a `{language}` placeholder. The stage randomly selects a language for each sample:
+
+```python
+# Simple Q&A prompt
+prompt = "Generate a Q&A pair about science in {language}."
+
+# Structured prompt with language prefixes
+prompt = """
+Generate a short question and a short answer in the general science domain in {language}.
+Begin with the language name using the 2-letter code in square brackets,
+e.g. [EN] for English, [FR] for French, [DE] for German.
+"""
+```
+
+### Step 3: Create the Pipeline
+
+```python
+from nemo_curator.pipeline import Pipeline
+from nemo_curator.stages.synthetic.qa_multilingual_synthetic import QAMultilingualSyntheticStage
+
+pipeline = Pipeline(
+    name="multilingual_qa_generation",
+    description="Generate synthetic Q&A pairs in multiple languages",
+)
+
+pipeline.add_stage(
+    QAMultilingualSyntheticStage(
+        prompt=prompt,
+        languages=["English", "French", "German", "Spanish", "Italian"],
+        client=llm_client,
+        model_name="meta/llama-3.3-70b-instruct",
+        num_samples=100,
+        generation_config=generation_config,
+    )
+)
+```
+
+### Step 4: Add Language Filtering (Optional)
+
+If your prompt includes language prefixes, you can filter to keep only specific languages:
+
+```python
+from nemo_curator.stages.text.filters.doc_filter import DocumentFilter
+from nemo_curator.stages.text.modules.score_filter import ScoreFilter
+
+
+class BeginsWithLanguageFilter(DocumentFilter):
+    """Filter documents based on language prefix codes."""
+
+    def __init__(self, languages: list[str]):
+        self.name = "begins_with_language_filter"
+        self.languages = languages
+
+    def score_document(self, text: str) -> float:
+        if not self.languages:
+            return 1.0
+        return 1.0 if text.startswith(tuple(self.languages)) else 0.0
+
+    def keep_document(self, score: float) -> bool:
+        return score == 1.0
+
+
+# Add filter to keep only English outputs
+pipeline.add_stage(
+    ScoreFilter(
+        BeginsWithLanguageFilter(languages=["[EN]"]),
+        text_field="text",
+    ),
+)
+```
+
+### Step 5: Configure Output
+
+Write results to JSONL or Parquet format:
+
+```python
+from nemo_curator.stages.text.io.writer.jsonl import JsonlWriter
+from nemo_curator.stages.text.io.writer.parquet import ParquetWriter
+
+# JSONL output
+pipeline.add_stage(JsonlWriter(path="./output/synthetic_qa/"))
+
+# Or Parquet output
+# pipeline.add_stage(ParquetWriter(path="./output/synthetic_qa/"))
+```
+
+### Step 6: Run the Pipeline
+
+```python
+from nemo_curator.core.client import RayClient
+
+# Initialize Ray
+client = RayClient(include_dashboard=False)
+client.start()
+
+# Execute pipeline
+print(pipeline.describe())
+results = pipeline.run()
+
+# Print results summary
+if results:
+    for result in results:
+        if hasattr(result, "data") and result.data:
+            for file_path in result.data:
+                print(f"Generated: {file_path}")
+
+client.stop()
+```
+
+## CLI Usage
+
+The tutorial script supports command-line arguments:
+
+```bash
+cd tutorials/synthetic
+
+# Basic usage
+python synthetic_data_generation_example.py --num-samples 50
+
+# Custom languages and model
+python synthetic_data_generation_example.py \
+    --num-samples 100 \
+    --languages English French German \
+    --model-name meta/llama-3.3-70b-instruct \
+    --temperature 0.9
+
+# Skip language filtering
+python synthetic_data_generation_example.py \
+    --num-samples 50 \
+    --no-filter-languages
+```
+
+### Available Arguments
+
+```{list-table}
+:header-rows: 1
+:widths: 25 15 60
+
+* - Argument
+  - Default
+  - Description
+* - `--api-key`
+  - env var
+  - NVIDIA API key (or set NVIDIA_API_KEY)
+* - `--base-url`
+  - NVIDIA API
+  - Base URL for the API endpoint
+* - `--model-name`
+  - llama-3.3-70b
+  - Model to use for generation
+* - `--languages`
+  - EN, FR, DE, ES, IT
+  - Languages to generate Q&A pairs for
+* - `--num-samples`
+  - 100
+  - Number of samples to generate
+* - `--temperature`
+  - 0.9
+  - Sampling temperature
+* - `--output-path`
+  - ./synthetic_output
+  - Output directory
+* - `--no-filter-languages`
+  - False
+  - Disable language filtering
+```
+
+## Sample Output
+
+Generated documents contain a `text` field with the LLM response:
+
+```json
+{"text": "[EN] Question: What causes ocean tides? Answer: Ocean tides are primarily caused by the gravitational pull of the Moon and Sun on Earth's water bodies."}
+{"text": "[FR] Question: Qu'est-ce que la photosynthèse? Answer: La photosynthèse est le processus par lequel les plantes convertissent la lumière du soleil en énergie."}
+{"text": "[DE] Question: Was ist der größte Planet in unserem Sonnensystem? Answer: Jupiter ist der größte Planet in unserem Sonnensystem."}
+```
+
+## Tips for Diverse Output
+
+1. **Use higher temperature** (0.7-1.0) for more varied outputs
+2. **Avoid fixed seeds** for non-deterministic generation
+3. **Include clear instructions** in the prompt for consistent formatting
+4. **Filter post-generation** to ensure quality standards
+
+---
+
+## Next Steps
+
+- {ref}`synthetic-llm-client`: Advanced client configuration and performance tuning
+- {ref}`nemotron-cc-overview`: Advanced pipelines for text transformation and knowledge extraction
+
diff --git a/docs/curate-text/synthetic/nemotron-cc/index.md b/docs/curate-text/synthetic/nemotron-cc/index.md
new file mode 100644
index 0000000000..2f2665f7f4
--- /dev/null
+++ b/docs/curate-text/synthetic/nemotron-cc/index.md
@@ -0,0 +1,280 @@
+---
+description: "Advanced synthetic data generation using NemotronCC pipelines for text transformation and knowledge extraction"
+categories: ["workflows"]
+tags: ["nemotron-cc", "paraphrasing", "knowledge-extraction", "distillation"]
+personas: ["data-scientist-focused", "mle-focused"]
+difficulty: "advanced"
+content_type: "workflow"
+modality: "text-only"
+---
+
+(nemotron-cc-overview)=
+# NemotronCC Pipelines
+
+NemotronCC provides advanced synthetic data generation workflows for transforming and extracting knowledge from existing text documents. Unlike simple generation, these pipelines use sophisticated preprocessing, LLM-based transformation, and postprocessing to create high-quality training data.
+
+## The Composable Pipeline Pattern
+
+NemotronCC stages follow a composable pattern with three distinct phases:
+
+1. **Preprocessing**: Segment documents, filter by length, and prepare inputs for the LLM
+2. **Generation**: Apply task-specific prompts to transform text using the LLM
+3. **Postprocessing**: Clean outputs, remove formatting artifacts, and filter low-quality results
+
+This separation enables fine-grained control over each phase while providing reusable helper functions for common patterns.
+
+## Pipeline Architecture
+
+```{mermaid}
+flowchart TB
+    subgraph "Preprocessing"
+        A[Input Documents] --> B[Token Count Filter]
+        B --> C[Document Splitter]
+        C --> D[Segment Filter]
+        D --> E[Document Joiner]
+    end
+    
+    subgraph "LLM Generation"
+        E --> F[Task-Specific Stage<br/>WikiPara/DiverseQA/Distill/etc.]
+    end
+    
+    subgraph "Postprocessing"
+        F --> G[Token Count Filter]
+        G --> H[Markdown Remover]
+        H --> I[Task-Specific Cleanup]
+        I --> J[Quality Filter]
+    end
+    
+    J --> K[Output Dataset]
+```
+
+## Available Tasks
+
+NemotronCC provides five specialized generation tasks, each designed for specific data transformation needs:
+
+```{list-table} NemotronCC Task Types
+:header-rows: 1
+:widths: 20 25 30 25
+
+* - Task
+  - Stage Class
+  - Purpose
+  - Use Case
+* - Wikipedia Paraphrasing
+  - `WikipediaParaphrasingStage`
+  - Rewrite text as Wikipedia-style prose
+  - Improving noisy web data
+* - Diverse QA
+  - `DiverseQAStage`
+  - Generate diverse Q&A pairs
+  - Reading comprehension training
+* - Distill
+  - `DistillStage`
+  - Create condensed, informative paraphrases
+  - Knowledge distillation
+* - Extract Knowledge
+  - `ExtractKnowledgeStage`
+  - Extract factual content as passages
+  - Knowledge base creation
+* - Knowledge List
+  - `KnowledgeListStage`
+  - Extract structured fact lists
+  - Fact extraction
+```
+
+## Quality-Based Processing Strategy
+
+NemotronCC pipelines are designed to process data based on quality scores. The typical approach:
+
+### High-Quality Data Pipeline
+
+For documents with high quality scores, use tasks that leverage the existing quality:
+- **DiverseQA**: Generate Q&A pairs from well-structured content
+- **Distill**: Create condensed versions preserving key information
+- **ExtractKnowledge**: Extract factual passages
+- **KnowledgeList**: Extract structured facts
+
+```python
+from nemo_curator.stages.text.modules.score_filter import Filter
+
+# Filter for high-quality documents (score > 11)
+pipeline.add_stage(
+    Filter(
+        filter_fn=lambda x: int(x) > 11,
+        filter_field="quality_score",
+    ),
+)
+```
+
+### Low-Quality Data Pipeline
+
+For documents with lower quality scores, use Wikipedia Paraphrasing to improve text quality:
+
+```python
+# Filter for low-quality documents (score <= 11)
+pipeline.add_stage(
+    Filter(
+        filter_fn=lambda x: int(x) <= 11,
+        filter_field="quality_score",
+    ),
+)
+```
+
+## Using Helper Functions
+
+The recommended approach is to use the helper functions in `nemotron_cc_pipelines.py`:
+
+```python
+from nemotron_cc_pipelines import (
+    add_preprocessing_pipeline,
+    add_diverse_qa_postprocessing_pipeline,
+)
+from nemo_curator.pipeline import Pipeline
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import DiverseQAStage
+
+pipeline = Pipeline(name="diverse_qa_pipeline")
+
+# Add preprocessing
+pipeline = add_preprocessing_pipeline(
+    pipeline=pipeline,
+    text_field="text",
+    system_prompt=SYSTEM_PROMPT,
+    user_prompt_template=PROMPT_TEMPLATE,
+    min_document_tokens=30,
+    min_segment_tokens=30,
+    max_input_tokens=1000,
+    args=args,  # Contains tokenizer config
+)
+
+# Add generation stage
+pipeline.add_stage(
+    DiverseQAStage(
+        client=llm_client,
+        model_name="meta/llama-3.3-70b-instruct",
+        generation_config=generation_config,
+        input_field="text",
+        output_field="diverse_qa",
+    )
+)
+
+# Add postprocessing
+pipeline = add_diverse_qa_postprocessing_pipeline(
+    pipeline=pipeline,
+    llm_response_field="diverse_qa",
+    args=args,
+)
+```
+
+## Task Configuration
+
+Each task has specific token count and preprocessing requirements:
+
+```{list-table} Task Configuration Defaults
+:header-rows: 1
+:widths: 25 15 15 20 25
+
+* - Task
+  - Min Doc Tokens
+  - Min Segment Tokens
+  - Max Input Tokens
+  - Max Output Tokens
+* - Diverse QA
+  - 30
+  - 30
+  - 1000
+  - 600
+* - Distill
+  - 30
+  - 10
+  - 2000
+  - 1600
+* - Extract Knowledge
+  - 30
+  - 30
+  - 1400
+  - 1400
+* - Knowledge List
+  - 30
+  - 30
+  - 1000
+  - 600
+* - Wikipedia Paraphrasing
+  - 5
+  - 5
+  - 512
+  - 512
+```
+
+## Quick Example
+
+```python
+import os
+from transformers import AutoTokenizer
+from nemo_curator.core.client import RayClient
+from nemo_curator.backends.xenna import XennaExecutor
+from nemo_curator.models.client.openai_client import AsyncOpenAIClient
+from nemo_curator.models.client.llm_client import GenerationConfig
+from nemo_curator.pipeline import Pipeline
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import DiverseQAStage
+from nemo_curator.stages.text.io.reader.parquet import ParquetReader
+from nemo_curator.stages.text.io.writer.parquet import ParquetWriter
+
+# Initialize
+client = RayClient(include_dashboard=False)
+client.start()
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")
+
+# Create LLM client
+llm_client = AsyncOpenAIClient(
+    api_key=os.environ["NVIDIA_API_KEY"],
+    base_url="https://integrate.api.nvidia.com/v1",
+    max_concurrent_requests=5,
+)
+
+# Build pipeline
+pipeline = Pipeline(name="nemotron_cc_diverse_qa")
+pipeline.add_stage(ParquetReader(file_paths=["./input_data/*.parquet"]))
+# ... add preprocessing stages ...
+pipeline.add_stage(
+    DiverseQAStage(
+        client=llm_client,
+        model_name="meta/llama-3.3-70b-instruct",
+        generation_config=GenerationConfig(temperature=0.5, top_p=0.9),
+        input_field="text",
+        output_field="diverse_qa",
+    )
+)
+# ... add postprocessing stages ...
+pipeline.add_stage(ParquetWriter(path="./output/"))
+
+# Execute
+executor = XennaExecutor()
+results = pipeline.run(executor)
+
+client.stop()
+```
+
+---
+
+## Detailed Reference
+
+::::{grid} 1
+:gutter: 2
+
+:::{grid-item-card} {octicon}`book;1.5em;sd-mr-1` Task Reference
+:link: tasks
+:link-type: doc
+Detailed reference for each NemotronCC stage, prompts, and post-processing
++++
+{bdg-secondary}`reference`
+{bdg-secondary}`api`
+:::
+
+::::
+
+```{toctree}
+:hidden:
+
+tasks
+```
+
diff --git a/docs/curate-text/synthetic/nemotron-cc/tasks.md b/docs/curate-text/synthetic/nemotron-cc/tasks.md
new file mode 100644
index 0000000000..8c00fa843b
--- /dev/null
+++ b/docs/curate-text/synthetic/nemotron-cc/tasks.md
@@ -0,0 +1,393 @@
+---
+description: "Reference documentation for NemotronCC synthetic data generation tasks and stages"
+categories: ["reference"]
+tags: ["nemotron-cc", "stages", "api-reference"]
+personas: ["data-scientist-focused", "mle-focused"]
+difficulty: "advanced"
+content_type: "reference"
+modality: "text-only"
+---
+
+(nemotron-cc-tasks)=
+# NemotronCC Task Reference
+
+This reference documents each NemotronCC synthetic data generation stage, including prompt templates, configuration options, and post-processing details.
+
+## WikipediaParaphrasingStage
+
+Rewrites low-quality text in Wikipedia-style prose, improving readability and structure.
+
+### Purpose
+
+Transform noisy or poorly-written web data into high-quality, encyclopedic text suitable for training language models.
+
+### Configuration
+
+```python
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import WikipediaParaphrasingStage
+
+stage = WikipediaParaphrasingStage(
+    client=llm_client,
+    model_name="meta/llama-3.3-70b-instruct",
+    generation_config=generation_config,
+    input_field="text",
+    output_field="rephrased",
+)
+```
+
+### Prompt Template
+
+The stage uses a system prompt establishing the assistant persona and a user prompt requesting paraphrasing:
+
+```text
+System: A chat between a curious user and an artificial intelligence assistant.
+The assistant gives helpful, detailed, and polite answers to the questions.
+
+User: For the following paragraph give me a diverse paraphrase of the same in
+high quality English language as in sentences on Wikipedia. Begin your answer
+on a separate line with "Here is a paraphrased version:".
+
+Text: {document}
+```
+
+See the [full prompt in source](https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/stages/synthetic/nemotron_cc/prompts.py).
+
+### Post-Processing
+
+The Wikipedia post-processing pipeline:
+1. Filters by token count (max 510 tokens)
+2. Removes markdown formatting
+3. Validates prefix "Here is a paraphrased version:"
+4. Removes the prefix from output
+5. Removes quotation marks
+6. Joins document segments
+7. Filters documents below 50 tokens
+
+---
+
+## DiverseQAStage
+
+Generates diverse question-answer pairs from document content.
+
+### Purpose
+
+Create reading comprehension training data with varied question types and cognitive complexity levels.
+
+### Configuration
+
+```python
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import DiverseQAStage
+
+stage = DiverseQAStage(
+    client=llm_client,
+    model_name="meta/llama-3.3-70b-instruct",
+    generation_config=generation_config,
+    input_field="text",
+    output_field="diverse_qa",
+)
+```
+
+### Prompt Template
+
+The stage requests up to 8 diverse Q&A pairs with specific formatting:
+
+```text
+Task: Read the text, ask questions and answer them.
+
+Follow these instructions:
+1. Ask diverse questions that require different cognitive skills
+2. Ask questions in various forms:
+   - Yes/No questions
+   - Open-ended questions (what, how, when, where, why, who)
+   - Multi-choice questions with options
+   - Comparison questions
+   - Reading comprehension questions
+   - Problem-solving questions
+3. Focus on factual information and key concepts
+4. Use clear and concise language
+5. Use plain text (no Markdown)
+6. Format: Question: [question] Answer: [answer]
+
+Text: {document}
+```
+
+### Post-Processing with DiverseQAPostProcessingStage
+
+The `DiverseQAPostProcessingStage` performs specialized parsing:
+
+```python
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import DiverseQAPostProcessingStage
+
+post_stage = DiverseQAPostProcessingStage(
+    input_field="text",
+    qa_field="diverse_qa",
+    tokenizer=tokenizer,  # For length-based sampling
+    prefix="Here are the questions and answers based on the provided text:",
+    max_num_pairs=10,
+)
+```
+
+**Post-processing logic:**
+1. Parse Q&A pairs from bullet-formatted output
+2. Merge question and answer lines
+3. Shuffle pairs randomly
+4. Sample pairs based on input document length (using tokenizer)
+5. Concatenate original document with selected Q&A pairs
+
+The number of Q&A pairs sampled is proportional to input length:
+```python
+num_pairs = random.randint(1, max(1, int(max_num_pairs * num_tokens / 150)))
+```
+
+---
+
+## DistillStage
+
+Creates condensed, information-dense paraphrases while preserving key concepts.
+
+### Purpose
+
+Generate training data that captures essential knowledge in a more accessible format, suitable for knowledge distillation.
+
+### Configuration
+
+```python
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import DistillStage
+
+stage = DistillStage(
+    client=llm_client,
+    model_name="meta/llama-3.3-70b-instruct",
+    generation_config=generation_config,
+    input_field="text",
+    output_field="distill",
+)
+```
+
+### Prompt Template
+
+```text
+System: You are an artificial intelligence assistant. You carefully provide
+accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning.
+
+User: Your task is to read and paraphrase the provided text following these instructions:
+- Create a condensed but accurate and informative version
+- Preserve crucial information, key concepts, important values, factual details
+- Retain technical terms and specialized vocabulary
+- Retain examples and explanations of reasoning
+- Only include information present in the original text
+- Write in plain text without formatting
+
+Text: {document}
+
+Task: Paraphrase in high-quality English. Begin with "Paraphrased Text:".
+```
+
+### Post-Processing
+
+1. Filter by token count (max 1598 tokens)
+2. Remove markdown formatting
+3. Validate "Paraphrased Text:" prefix
+4. Remove the prefix
+5. Remove quotation marks
+6. Filter documents below 50 tokens
+
+---
+
+## ExtractKnowledgeStage
+
+Extracts and rewrites knowledge as textbook-style passages.
+
+### Purpose
+
+Convert raw text into educational-quality passages organized by domain, suitable for building knowledge bases.
+
+### Configuration
+
+```python
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import ExtractKnowledgeStage
+
+stage = ExtractKnowledgeStage(
+    client=llm_client,
+    model_name="meta/llama-3.3-70b-instruct",
+    generation_config=generation_config,
+    input_field="text",
+    output_field="extract_knowledge",
+)
+```
+
+### Prompt Template
+
+```text
+Your task is to rewrite knowledge from the provided text following these instructions:
+- Rewrite as passages using easy-to-understand, high-quality English
+  like sentences in textbooks and Wikipedia
+- Focus on content in disciplines: humanities, social sciences, natural sciences,
+  technology, engineering, math, law, business, management, art, education,
+  agricultural sciences, politics, and history
+- Disregard content without useful facts or knowledge
+- Retain examples and supporting evidence
+- Do not add or alter details
+- Write in plain text
+- Do not add titles or comments
+
+Text: {document}
+
+Task: Rewrite facts and knowledge as passages following the instructions.
+```
+
+### Post-Processing
+
+1. Filter by token count (max 1398 tokens)
+2. Remove markdown formatting
+3. Remove passage labels ("Passage:", "Passage 1:", etc.)
+4. Filter documents below 50 tokens
+
+---
+
+## KnowledgeListStage
+
+Extracts structured fact lists from documents.
+
+### Purpose
+
+Generate bullet-pointed factual content for structured knowledge extraction.
+
+### Configuration
+
+```python
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import KnowledgeListStage
+
+stage = KnowledgeListStage(
+    client=llm_client,
+    model_name="meta/llama-3.3-70b-instruct",
+    generation_config=generation_config,
+    input_field="text",
+    output_field="knowledge_list",
+)
+```
+
+### Prompt Template
+
+```text
+Review the text and extract the key information. Follow these instructions:
+- Provide a concise and organized list of factual information
+- Include concrete details, key concepts, and important statistics
+- Ensure each point is clear, specific, and supported by the original text
+- Ensure extracted text is information-dense
+- Do not add titles or headings
+
+Text: {document}
+
+Task: Extract factual information, concrete details, and key concepts.
+```
+
+### Post-Processing with KnowledgeListPostProcessingStage
+
+```python
+from nemo_curator.stages.synthetic.nemotron_cc.nemotron_cc import KnowledgeListPostProcessingStage
+
+post_stage = KnowledgeListPostProcessingStage(
+    input_field="knowledge_list",
+)
+```
+
+**Post-processing logic:**
+1. Remove leading bullet markers ("- ")
+2. Normalize indentation
+3. Join lines with newlines
+
+---
+
+## Customizing Prompts
+
+To use custom prompts while maintaining NemotronCC infrastructure, subclass `BaseSyntheticStage`:
+
+```python
+from dataclasses import dataclass
+from nemo_curator.stages.synthetic.nemotron_cc.base import BaseSyntheticStage
+
+
+@dataclass
+class CustomSyntheticStage(BaseSyntheticStage):
+    system_prompt: str = "You are a helpful assistant specialized in..."
+    prompt: str = """Your custom prompt template here.
+
+Text: {document}
+
+Instructions: ..."""
+    input_field: str = "text"
+    output_field: str = "custom_output"
+
+    @property
+    def name(self) -> str:
+        return "CustomSyntheticStage"
+```
+
+The `{document}` placeholder is replaced with the content from `input_field`.
+
+---
+
+## Complete Configuration Example
+
+```python
+TASK_CONFIG = {
+    "diverse_qa": {
+        "system_prompt": NEMOTRON_CC_SYSTEM_PROMPT,
+        "prompt_template": DIVERSE_QA_PROMPT_TEMPLATE,
+        "min_document_tokens": 30,
+        "min_segment_tokens": 30,
+        "max_input_tokens": 1000,
+        "max_output_tokens": 600,
+    },
+    "distill": {
+        "system_prompt": NEMOTRON_CC_DISTILL_SYSTEM_PROMPT,
+        "prompt_template": DISTILL_PROMPT_TEMPLATE,
+        "min_document_tokens": 30,
+        "min_segment_tokens": 10,
+        "max_input_tokens": 2000,
+        "max_output_tokens": 1600,
+    },
+    "extract_knowledge": {
+        "system_prompt": NEMOTRON_CC_SYSTEM_PROMPT,
+        "prompt_template": EXTRACT_KNOWLEDGE_PROMPT_TEMPLATE,
+        "min_document_tokens": 30,
+        "min_segment_tokens": 30,
+        "max_input_tokens": 1400,
+        "max_output_tokens": 1400,
+    },
+    "knowledge_list": {
+        "system_prompt": NEMOTRON_CC_SYSTEM_PROMPT,
+        "prompt_template": KNOWLEDGE_LIST_PROMPT_TEMPLATE,
+        "min_document_tokens": 30,
+        "min_segment_tokens": 30,
+        "max_input_tokens": 1000,
+        "max_output_tokens": 600,
+    },
+    "wikipedia_paraphrasing": {
+        "system_prompt": NEMOTRON_CC_SYSTEM_PROMPT,
+        "prompt_template": WIKIPEDIA_REPHRASING_PROMPT_TEMPLATE,
+        "min_document_tokens": 5,
+        "min_segment_tokens": 5,
+        "max_input_tokens": 512,
+        "max_output_tokens": 512,
+    },
+}
+
+GENERATION_CONFIG = {
+    "MAX_INPUT_TOKENS": 2000,
+    "MAX_OUTPUT_TOKENS": 1600,
+    "TOP_K": 0,
+    "TOP_P": 0.9,
+    "TEMPERATURE": 0.5,
+}
+```
+
+---
+
+## Source Code References
+
+- **Prompts**: [`nemo_curator/stages/synthetic/nemotron_cc/prompts.py`](https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/stages/synthetic/nemotron_cc/prompts.py)
+- **Stages**: [`nemo_curator/stages/synthetic/nemotron_cc/nemotron_cc.py`](https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/stages/synthetic/nemotron_cc/nemotron_cc.py)
+- **Base Class**: [`nemo_curator/stages/synthetic/nemotron_cc/base.py`](https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/stages/synthetic/nemotron_cc/base.py)
+- **Pipeline Helpers**: [`tutorials/synthetic/nemotron_cc/nemotron_cc_pipelines.py`](https://github.com/NVIDIA-NeMo/Curator/blob/main/tutorials/synthetic/nemotron_cc/nemotron_cc_pipelines.py)
+
diff --git a/tutorials/synthetic/README.md b/tutorials/synthetic/README.md
index 7765d169c3..c96a9b895e 100644
--- a/tutorials/synthetic/README.md
+++ b/tutorials/synthetic/README.md
@@ -1,36 +1,130 @@
 # Synthetic Data Generation Tutorials
 
-Hands-on tutorials for generating synthetic data with NeMo Curator. Complete working examples with detailed explanations.
+Hands-on tutorials for generating synthetic data with NeMo Curator using Ray-based distributed processing.
 
+## Documentation
+
+For comprehensive documentation, refer to the [Synthetic Data Generation Guide](../../docs/curate-text/synthetic/index.md).
 
 ## Getting Started
 
 ### Prerequisites
 
-To run these tutorials, you'll need an NVIDIA API key. You can obtain one from:
-- **NVIDIA Build**: https://build.nvidia.com/settings/api-keys
+- **NVIDIA API Key**: Obtain from [NVIDIA Build](https://build.nvidia.com/settings/api-keys)
+- **NeMo Curator**: Installed with text extras (`pip install nemo-curator[text_cuda12]`)
 
 ### Setup
 
-Set your API key as an environment variable:
-
 ```bash
 export NVIDIA_API_KEY="your-api-key-here"
 ```
 
-Alternatively, you can pass it directly using the `--api-key` argument when running the examples.
+## Available Tutorials
+
+| Tutorial | Description | Difficulty |
+|----------|-------------|------------|
+| [Multilingual Q&A](synthetic_data_generation_example.py) | Generate Q&A pairs in multiple languages | Beginner |
+| [NemotronCC High-Quality](nemotron_cc/nemotron_cc_sdg_high_quality_example_pipeline.py) | Advanced SDG for high-quality data (DiverseQA, Distill, ExtractKnowledge, KnowledgeList) | Advanced |
+| [NemotronCC Low-Quality](nemotron_cc/nemotron_cc_sdg_low_quality_example_pipeline.py) | Improve low-quality data via Wikipedia-style paraphrasing | Advanced |
 
-### Quick Example
+## Quick Examples
+
+### Basic Multilingual Q&A
 
 ```bash
 # Generate 20 synthetic Q&A pairs in multiple languages
 python synthetic_data_generation_example.py --num-samples 20
+
+# Customize languages and disable filtering
+python synthetic_data_generation_example.py \
+    --num-samples 50 \
+    --languages English French German Spanish \
+    --no-filter-languages
 ```
 
+### NemotronCC Pipelines
 
-## Available Tutorials
+```bash
+# Run DiverseQA pipeline with mock data (requires tokenizer access)
+python nemotron_cc/nemotron_cc_sdg_high_quality_example_pipeline.py \
+    --task diverse_qa \
+    --tokenizer meta-llama/Llama-3.3-70B-Instruct \
+    --mock
+
+# Run Distill pipeline
+python nemotron_cc/nemotron_cc_sdg_high_quality_example_pipeline.py \
+    --task distill \
+    --tokenizer meta-llama/Llama-3.3-70B-Instruct \
+    --mock
+
+# Run Wikipedia Paraphrasing for low-quality data
+python nemotron_cc/nemotron_cc_sdg_low_quality_example_pipeline.py \
+    --tokenizer meta-llama/Llama-3.3-70B-Instruct \
+    --mock
+```
+
+### Using Real Data
+
+```bash
+# Process Parquet input files
+python nemotron_cc/nemotron_cc_sdg_high_quality_example_pipeline.py \
+    --task diverse_qa \
+    --tokenizer meta-llama/Llama-3.3-70B-Instruct \
+    --input-parquet-path ./my_data/*.parquet \
+    --output-path ./synthetic_output \
+    --output-format parquet
+```
+
+## Command-Line Arguments
+
+### Common Arguments
+
+| Argument | Default | Description |
+|----------|---------|-------------|
+| `--api-key` | env var | NVIDIA API key |
+| `--base-url` | NVIDIA API | Base URL for API endpoint |
+| `--model-name` | llama-3.3-70b | Model to use for generation |
+| `--output-path` | ./synthetic_output | Output directory |
+| `--max-concurrent-requests` | 3 | Concurrent API requests |
+| `--temperature` | 0.9 (QA) / 0.5 (NemotronCC) | Sampling temperature |
+
+### NemotronCC-Specific Arguments
+
+| Argument | Default | Description |
+|----------|---------|-------------|
+| `--task` | diverse_qa | Task type (diverse_qa, distill, extract_knowledge, knowledge_list) |
+| `--tokenizer` | required | HuggingFace tokenizer name |
+| `--mock` | False | Use built-in test data |
+| `--input-parquet-path` | None | Input Parquet file path/glob |
+| `--output-format` | parquet | Output format (jsonl, parquet) |
+
+## Example Output
+
+### Multilingual Q&A
+
+```json
+{"text": "[EN] Question: What causes ocean tides? Answer: Ocean tides are primarily caused by the gravitational pull of the Moon and Sun on Earth's water bodies."}
+{"text": "[FR] Question: Qu'est-ce que la photosynthèse? Answer: La photosynthèse est le processus par lequel les plantes convertissent la lumière du soleil en énergie."}
+```
+
+### DiverseQA
+
+The output contains the original text followed by generated Q&A pairs:
+
+```text
+The Amazon rainforest contains an unparalleled diversity of plant and animal species...
+
+Question: What makes the Amazon rainforest unique in terms of biodiversity?
+Answer: The Amazon rainforest contains an unparalleled diversity of plant and animal species.
+
+Question: True or False: The Amazon rainforest has limited species diversity.
+Answer: False. The Amazon rainforest contains an unparalleled diversity of species.
+```
+
+---
 
-| Tutorial | Description | Files |
-|----------|-------------|-------|
-| **[Multilingual Q&A Generation](synthetic_data_generation_example.py)** | Generate synthetic Q&A pairs in multiple languages using LLMs | `synthetic_data_generation_example.py` |
+## Additional Resources
 
+- [LLM Client Configuration](../../docs/curate-text/synthetic/llm-client.md)
+- [NemotronCC Pipeline Documentation](../../docs/curate-text/synthetic/nemotron-cc/index.md)
+- [Task Reference](../../docs/curate-text/synthetic/nemotron-cc/tasks.md)

From bccea95616d39df573d1dcb8471e660fedc56b58 Mon Sep 17 00:00:00 2001
From: Lawrence Lane <llane@nvidia.com>
Date: Fri, 2 Jan 2026 10:48:58 -0500
Subject: [PATCH 2/5] header, tab fixes

Signed-off-by: Lawrence Lane <llane@nvidia.com>
---
 docs/curate-text/synthetic/index.md             |  2 +-
 docs/curate-text/synthetic/llm-client.md        | 16 ++++++++--------
 docs/curate-text/synthetic/nemotron-cc/index.md |  4 ++++
 docs/index.md                                   |  1 +
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/docs/curate-text/synthetic/index.md b/docs/curate-text/synthetic/index.md
index 7112b0288d..cf1d81e086 100644
--- a/docs/curate-text/synthetic/index.md
+++ b/docs/curate-text/synthetic/index.md
@@ -112,7 +112,7 @@ Before using synthetic data generation, ensure you have:
 
 ---
 
-## Getting Started
+## Topics
 
 ::::{grid} 1 1 2 2
 :gutter: 2
diff --git a/docs/curate-text/synthetic/llm-client.md b/docs/curate-text/synthetic/llm-client.md
index 1b517e8230..7a30e7307b 100644
--- a/docs/curate-text/synthetic/llm-client.md
+++ b/docs/curate-text/synthetic/llm-client.md
@@ -166,9 +166,9 @@ The retry logic handles:
 
 ## Using Custom Endpoints
 
-````{tab-set}
+::::{tab-set}
 
-```{tab-item} Local vLLM Server
+:::{tab-item} Local vLLM Server
 
 Deploy a local vLLM server and configure the client:
 
@@ -189,9 +189,9 @@ client = AsyncOpenAIClient(
     timeout=300,  # Longer timeout for large models
 )
 ```
-```
+:::
 
-```{tab-item} Text Generation Inference (TGI)
+:::{tab-item} Text Generation Inference (TGI)
 
 Deploy a TGI server and configure the client:
 
@@ -210,9 +210,9 @@ client = AsyncOpenAIClient(
     max_concurrent_requests=8,
 )
 ```
-```
+:::
 
-```{tab-item} OpenAI API
+:::{tab-item} OpenAI API
 
 Use the official OpenAI API:
 
@@ -223,9 +223,9 @@ client = AsyncOpenAIClient(
     max_concurrent_requests=5,
 )
 ```
-```
+:::
 
-````
+::::
 
 ## Complete Example
 
diff --git a/docs/curate-text/synthetic/nemotron-cc/index.md b/docs/curate-text/synthetic/nemotron-cc/index.md
index 2f2665f7f4..ca73912d07 100644
--- a/docs/curate-text/synthetic/nemotron-cc/index.md
+++ b/docs/curate-text/synthetic/nemotron-cc/index.md
@@ -124,6 +124,10 @@ pipeline.add_stage(
 
 The recommended approach is to use the helper functions in `nemotron_cc_pipelines.py`:
 
+:::{note}
+The `nemotron_cc_pipelines` helper functions are provided in the [tutorials directory](https://github.com/NVIDIA-NeMo/Curator/blob/main/tutorials/synthetic/nemotron_cc/nemotron_cc_pipelines.py), not as part of the installed package. Copy this file to your project or reference the patterns when building custom pipelines.
+:::
+
 ```python
 from nemotron_cc_pipelines import (
     add_preprocessing_pipeline,
diff --git a/docs/index.md b/docs/index.md
index 25227c1ce3..fc4390ebf3 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -223,6 +223,7 @@ curate-text/index.md
 Tutorials <curate-text/tutorials/index.md>
 Load Data <curate-text/load-data/index.md>
 Process Data <curate-text/process-data/index.md>
+Synthetic Data <curate-text/synthetic/index.md>
 ::::
 
 ::::{toctree}

From abd22090bb8298d139f171f5345fbb49bc9655b9 Mon Sep 17 00:00:00 2001
From: Lawrence Lane <llane@nvidia.com>
Date: Fri, 2 Jan 2026 11:00:27 -0500
Subject: [PATCH 3/5] style guide

Signed-off-by: Lawrence Lane <llane@nvidia.com>
---
 docs/curate-text/synthetic/multilingual-qa.md   | 2 +-
 docs/curate-text/synthetic/nemotron-cc/tasks.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/curate-text/synthetic/multilingual-qa.md b/docs/curate-text/synthetic/multilingual-qa.md
index 8417d63b34..e1b72779f2 100644
--- a/docs/curate-text/synthetic/multilingual-qa.md
+++ b/docs/curate-text/synthetic/multilingual-qa.md
@@ -116,7 +116,7 @@ prompt = "Generate a Q&A pair about science in {language}."
 prompt = """
 Generate a short question and a short answer in the general science domain in {language}.
 Begin with the language name using the 2-letter code in square brackets,
-e.g. [EN] for English, [FR] for French, [DE] for German.
+for example, [EN] for English, [FR] for French, [DE] for German.
 """
 ```
 
diff --git a/docs/curate-text/synthetic/nemotron-cc/tasks.md b/docs/curate-text/synthetic/nemotron-cc/tasks.md
index 8c00fa843b..cb4b0e5069 100644
--- a/docs/curate-text/synthetic/nemotron-cc/tasks.md
+++ b/docs/curate-text/synthetic/nemotron-cc/tasks.md
@@ -50,7 +50,7 @@ on a separate line with "Here is a paraphrased version:".
 Text: {document}
 ```
 
-See the [full prompt in source](https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/stages/synthetic/nemotron_cc/prompts.py).
+Refer to the [full prompt in source](https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/stages/synthetic/nemotron_cc/prompts.py).
 
 ### Post-Processing
 
@@ -89,7 +89,7 @@ stage = DiverseQAStage(
 
 ### Prompt Template
 
-The stage requests up to 8 diverse Q&A pairs with specific formatting:
+The stage requests up to eight diverse Q&A pairs with specific formatting:
 
 ```text
 Task: Read the text, ask questions and answer them.

From 91f5f9ae8c942c1db648c079a01b35ad7ee9f250 Mon Sep 17 00:00:00 2001
From: Lawrence Lane <llane@nvidia.com>
Date: Fri, 2 Jan 2026 11:04:33 -0500
Subject: [PATCH 4/5] release notes change, bump version

Signed-off-by: Lawrence Lane <llane@nvidia.com>
---
 docs/about/release-notes/index.md | 203 +-----------------------------
 docs/conf.py                      |   4 +-
 docs/versions1.json               |   5 +
 3 files changed, 8 insertions(+), 204 deletions(-)

diff --git a/docs/about/release-notes/index.md b/docs/about/release-notes/index.md
index 7492c9e141..91487bc6f7 100644
--- a/docs/about/release-notes/index.md
+++ b/docs/about/release-notes/index.md
@@ -12,184 +12,6 @@ modality: "universal"
 
 # NeMo Curator Release Notes: {{ current_release }}
 
-This major release represents a fundamental architecture shift from [Dask](https://www.dask.org/) to [Ray](https://www.ray.io/), expanding NeMo Curator to support multimodal data curation with new [video](../../curate-video/index.md) and [audio](../../curate-audio/index.md) capabilities. This refactor enables unified backend processing, better heterogeneous computing support, and enhanced autoscaling for dynamic workloads.
-
-**Migrating from a previous version of NeMo Curator?** Refer to the {ref}`Migration Guide <migration-guide>` for step-by-step instructions and the {ref}`Migration FAQ <migration-faq>` for common questions.
-
-## Installation Updates
-
-- **New Docker container**: Updated Docker infrastructure with CUDA 12.8.1 and Ubuntu 24.04 base; obtainable through the [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) (`nvcr.io/nvidia/nemo-curator:{{ container_version }}`)
-- **Docker file to build own image**: Simplified [Dockerfile](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/Dockerfile) structure for custom container builds with FFmpeg support
-- **UV source installations**: Integrated UV package manager (v0.8.22) for faster dependency management
-- **PyPI improvements**: Enhanced PyPI installation with modular extras for targeted functionality:
-
-  ```{list-table} Available Installation Extras
-  :header-rows: 1
-  :widths: 25 35 40
-
-  * - Extra
-    - Installation Command
-    - Description
-  * - **All Modalities**
-    - `nemo-curator[all]`
-    - Complete installation with all modalities and GPU support
-  * - **Text Curation**
-    - `nemo-curator[text_cuda12]`
-    - GPU-accelerated text processing with RAPIDS
-  * - **Image Curation**
-    - `nemo-curator[image_cuda12]`
-    - Image processing with NVIDIA DALI
-  * - **Audio Curation**
-    - `nemo-curator[audio_cuda12]`
-    - Speech recognition with NeMo ASR models
-  * - **Video Curation**
-    - `nemo-curator[video_cuda12]`
-    - Video processing with GPU acceleration
-  * - **Basic GPU**
-    - `nemo-curator[cuda12]`
-    - CUDA utilities without modality-specific dependencies
-  ```
-
-  All GPU installations require the NVIDIA PyPI index:
-  ```bash
-  uv pip install https://pypi.nvidia.com nemo-curator[EXTRA]
-  ```
-
-## New Modalities
-
-### Video
-
-NeMo Curator now supports comprehensive [video data curation](../../curate-video/index.md) with distributed processing capabilities:
-
-- **Video splitting**: [Fixed-stride](../../curate-video/process-data/clipping.md) and [scene-change detection (TransNetV2)](../../curate-video/process-data/clipping.md) for clip extraction
-- **Semantic deduplication**: [K-means clustering and pairwise similarity](../../curate-video/process-data/dedup.md) for near-duplicate clip removal
-- **Content filtering**: [Motion-based filtering](../../curate-video/process-data/filtering.md) and [aesthetic filtering](../../curate-video/process-data/filtering.md) for quality improvement
-- **Embedding generation**: InternVideo2 and Cosmos-Embed1 models for clip-level embeddings
-- **Enhanced captioning**: [VL-based caption generation with optional LLM-based rewriting](../../curate-video/process-data/captions-preview.md) (Qwen-VL and Qwen-LM supported) for detailed video descriptions
-- **Ray-based distributed architecture**: Scalable video processing with [autoscaling support](../concepts/video/architecture.md)
-
-### Audio
-
-New [audio curation capabilities](../../curate-audio/index.md) for speech data processing:
-
-- **ASR inference**: [Automatic speech recognition](../../curate-audio/process-data/asr-inference/index.md) using NeMo Framework pretrained models
-- **Quality assessment**: [Word Error Rate (WER) and Character Error Rate (CER)](../../curate-audio/process-data/quality-assessment/index.md) calculation
-- **Speech metrics**: [Duration analysis and speech rate metrics](../../curate-audio/process-data/audio-analysis/index.md) (words/characters per second)
-- **Text integration**: Seamless integration with [text curation workflows](../../curate-audio/process-data/text-integration/index.md) via `AudioToDocumentStage`
-- **Manifest support**: JSONL manifest format for audio file management
-
-## Modality Refactors
-
-### Text
-
-- **Ray backend migration**: Complete transition from Dask to Ray for distributed [text processing](../../curate-text/index.md)
-- **Improved model-based classifier throughput**: Better overlapping of compute between tokenization and inference through [length-based sequence sorting](../../curate-text/process-data/quality-assessment/distributed-classifier.md) for optimal GPU memory utilization
-- **Task-centric architecture**: New `Task`-based processing model for finer-grained control
-- **Pipeline redesign**: Updated `ProcessingStage` and `Pipeline` architecture with resource specification
-
-### Image
-
-- **Pipeline-based architecture**: Transitioned from legacy `ImageTextPairDataset` to modern [stage-based processing](../../curate-images/index.md) with `ImageReaderStage`, `ImageEmbeddingStage`, and filter stages
-- **DALI-based image loading**: New `ImageReaderStage` uses NVIDIA DALI for high-performance WebDataset tar shard processing with GPU/CPU fallback
-- **Modular processing stages**: Separate stages for [embedding generation](../../curate-images/process-data/embeddings/index.md), [aesthetic filtering](../../curate-images/process-data/filters/aesthetic.md), and [NSFW filtering](../../curate-images/process-data/filters/nsfw.md)
-- **Task-based data flow**: Images processed as `ImageBatch` tasks containing `ImageObject` instances with metadata, embeddings, and classification scores
-
-Learn more about [image curation](../../curate-images/index.md).
-
-## Deduplication Improvements
-
-Enhanced deduplication capabilities across all modalities with improved performance and flexibility:
-
-- **Exact and Fuzzy deduplication**: Updated [rapidsmpf-based shuffle backend](../../reference/infrastructure/gpu-processing.md) for more efficient GPU-to-GPU data transfer and better spilling capabilities
-- **Semantic deduplication**: Support for deduplicating [text](../../curate-text/process-data/deduplication/semdedup.md) and [video](../../curate-video/process-data/dedup.md) datasets using unified embedding-based workflows
-- **New ranking strategies**: Added `RankingStrategy` which allows you to rank elements within cluster centers to decide which point to prioritize during duplicate removal, supporting [metadata-based ranking](../../curate-text/process-data/deduplication/semdedup.md) to prioritize specific datasets or inputs
-
-## Core Refactors
-
-The architecture refactor introduces a layered system with unified interfaces and multiple execution backends:
-
-```{mermaid}
-graph LR
-    subgraph "User Layer"
-        P[Pipeline]
-        S1[ProcessingStage X→Y]
-        S2[ProcessingStage Y→Z]
-        S3[ProcessingStage Z→W]
-        R[Resources<br/>CPU/GPU/NVDEC/NVENC]
-    end
-    
-    subgraph "Orchestration Layer"
-        BE[BaseExecutor Interface]
-    end
-    
-    subgraph "Backend Layer"
-        XE[XennaExecutor<br/>Production Ready]
-        RAP[RayActorPoolExecutor<br/>Experimental]
-        RDE[RayDataExecutor<br/>Experimental]
-    end
-    
-    subgraph "Adaptation Layer"
-        XA[Xenna Adapter]
-        RAPA[Ray Actor Adapter]
-        RDA[Ray Data Adapter]
-    end
-    
-    subgraph "Execution Layer"
-        X[Cosmos-Xenna<br/>Streaming/Batch]
-        RAY1[Ray Actor Pool<br/>Load Balancing]
-        RAY2[Ray Data API<br/>Dataset Processing]
-    end
-    
-    P --> S1
-    P --> S2
-    P --> S3
-    S1 -.-> R
-    S2 -.-> R
-    S3 -.-> R
-    
-    P --> BE
-    BE --> XE
-    BE --> RAP
-    BE --> RDE
-    
-    XE --> XA
-    RAP --> RAPA
-    RDE --> RDA
-    
-    XA --> X
-    RAPA --> RAY1
-    RDA --> RAY2
-    
-    style XE fill:#90EE90
-    style RAP fill:#FFE4B5
-    style RDE fill:#FFE4B5
-    style P fill:#E6F3FF
-    style BE fill:#F0F8FF
-```
-
-### Pipelines
-
-- **New Pipeline API**: Ray-based pipeline execution with `BaseExecutor` interface
-- **Multiple backends**: Support for [Xenna, Ray Actor Pool, and Ray Data execution backends](../../reference/infrastructure/execution-backends.md)
-- **Resource specification**: Configurable CPU and GPU memory requirements per stage
-- **Stage composition**: Improved stage validation and execution orchestration
-
-### Stages
-
-- **ProcessingStage redesign**: Generic `ProcessingStage[X, Y]` base class with type safety
-- **Resource requirements**: Built-in resource specification for CPU and GPU memory
-- **Backend adapters**: Stage adaptation layer for different Ray orchestration systems
-- **Input/output validation**: Enhanced type checking and data validation
-
-## Tutorials
-
-- **Text tutorials**: Updated all [text curation tutorials](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/text) to use new Ray-based API
-- **Image tutorials**: Migrated [image processing tutorials](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/image) to unified backend
-- **Audio tutorials**: New [audio curation tutorials](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/audio)
-- **Video tutorials**: New [video processing tutorials](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video)
-
-For all tutorial content, refer to the [tutorials directory](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials) in the NeMo Curator GitHub repository.
-
 ## Synthetic Data Generation
 
 New Ray-based synthetic data generation capabilities for creating and augmenting training data using LLMs:
@@ -205,35 +27,12 @@ New Ray-based synthetic data generation capabilities for creating and augmenting
 
 Learn more in the [Synthetic Data Generation documentation](../../curate-text/synthetic/index.md).
 
-## Known Limitations
-
-> (Pending Refactor in Future Release)
-
-### Generation
-
-- **Hard negative mining**: Retrieval-based data generation workflows under development
-
-### PII
-
-- **PII processing**: Personal Identifiable Information removal tools are being updated for Ray backend
-- **Privacy workflows**: Enhanced privacy-preserving data curation capabilities in development
-
-### Blending & Shuffling
-
-- **Data blending**: Multi-source dataset blending functionality being refactored
-- **Dataset shuffling**: Large-scale data shuffling operations under development
-
-## Docs Refactor
-
-- **Local preview capability**: Improved documentation build system with local preview support
-- **Modality-specific guides**: Comprehensive documentation for each supported modality ([text](../../curate-text/index.md), [image](../../curate-images/index.md), [audio](../../curate-audio/index.md), [video](../../curate-video/index.md))
-- **API reference**: Complete [API documentation](../../apidocs/index.rst) with type annotations and examples
 
 ---
 
 ## What's Next
 
-The next release will focus on completing the refactor of Synthetic Data Generation, PII, and Blending & Shuffling features, along with additional performance optimizations and new modality support.
+The next release will focus on ...
 
 ```{toctree}
 :hidden:
diff --git a/docs/conf.py b/docs/conf.py
index 02e763b92e..a619cfc8b5 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -29,7 +29,7 @@
 project = "NeMo-Curator"
 project_copyright = "2025, NVIDIA Corporation"
 author = "NVIDIA Corporation"
-release = "25.09"
+release = "26.02"
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
@@ -122,7 +122,7 @@
     "min_python_version": "3.8",
     "recommended_cuda": "12.0+",
     "current_release": release,
-    "container_version": "25.09",
+    "container_version": "26.02",
 }
 
 # Enable figure numbering
diff --git a/docs/versions1.json b/docs/versions1.json
index 9fd5dcd52d..d9f09cf338 100644
--- a/docs/versions1.json
+++ b/docs/versions1.json
@@ -1,6 +1,11 @@
 [
     {
         "preferred": true,
+        "version": "26.02",
+        "url": "https://docs.nvidia.com/nemo/curator/26.02/"
+    },
+    {
+        "preferred": false,
         "version": "25.09",
         "url": "https://docs.nvidia.com/nemo/curator/25.09/"
     },

From 9a29ce7ba55372818722791457eb25f286f5c431 Mon Sep 17 00:00:00 2001
From: Lawrence Lane <llane@nvidia.com>
Date: Fri, 2 Jan 2026 15:43:03 -0500
Subject: [PATCH 5/5] feedback

Signed-off-by: Lawrence Lane <llane@nvidia.com>
---
 docs/curate-text/synthetic/llm-client.md      | 96 ++-----------------
 docs/curate-text/synthetic/multilingual-qa.md |  2 +-
 .../synthetic/nemotron-cc/index.md            |  2 +-
 tutorials/synthetic/README.md                 | 38 ++------
 4 files changed, 18 insertions(+), 120 deletions(-)

diff --git a/docs/curate-text/synthetic/llm-client.md b/docs/curate-text/synthetic/llm-client.md
index 7a30e7307b..b626bc6529 100644
--- a/docs/curate-text/synthetic/llm-client.md
+++ b/docs/curate-text/synthetic/llm-client.md
@@ -118,32 +118,6 @@ client = AsyncOpenAIClient(
     base_url="https://integrate.api.nvidia.com/v1",
     max_concurrent_requests=3,  # Conservative for cloud APIs
 )
-
-# For local vLLM server with more capacity
-client = AsyncOpenAIClient(
-    base_url="http://localhost:8000/v1",
-    max_concurrent_requests=16,  # Higher for local deployment
-)
-```
-
-### Optimal Settings
-
-```{list-table} Recommended Concurrency Settings
-:header-rows: 1
-:widths: 30 25 45
-
-* - Endpoint Type
-  - Recommended Setting
-  - Notes
-* - NVIDIA API (cloud)
-  - 3-5
-  - Respects rate limits; increase gradually
-* - Local vLLM
-  - 8-32
-  - Depends on GPU memory and model size
-* - Local TGI
-  - 8-16
-  - Adjust based on server configuration
 ```
 
 ### Retry Configuration
@@ -164,68 +138,25 @@ The retry logic handles:
 - **Connection errors**: Retry with exponential delay
 - **Transient failures**: Configurable retry attempts
 
-## Using Custom Endpoints
-
-::::{tab-set}
-
-:::{tab-item} Local vLLM Server
-
-Deploy a local vLLM server and configure the client:
-
-**Start vLLM server:**
-```bash
-vllm serve meta-llama/Llama-3.3-70B-Instruct \
-    --host 0.0.0.0 \
-    --port 8000 \
-    --tensor-parallel-size 4
-```
-
-**Configure client:**
-```python
-client = AsyncOpenAIClient(
-    base_url="http://localhost:8000/v1",
-    api_key="not-needed",  # vLLM doesn't require API key by default
-    max_concurrent_requests=16,
-    timeout=300,  # Longer timeout for large models
-)
-```
-:::
-
-:::{tab-item} Text Generation Inference (TGI)
-
-Deploy a TGI server and configure the client:
+## Using Other OpenAI-Compatible Endpoints
 
-**Start TGI server:**
-```bash
-docker run --gpus all -p 8080:80 \
-    ghcr.io/huggingface/text-generation-inference:latest \
-    --model-id meta-llama/Llama-3.3-70B-Instruct
-```
+The `AsyncOpenAIClient` works with any OpenAI-compatible API endpoint. Simply configure the `base_url` and `api_key` parameters:
 
-**Configure client:**
 ```python
+# OpenAI API
 client = AsyncOpenAIClient(
-    base_url="http://localhost:8080/v1",
-    api_key="not-needed",
-    max_concurrent_requests=8,
+    base_url="https://api.openai.com/v1",
+    api_key="sk-...",  # Or set OPENAI_API_KEY env var
+    max_concurrent_requests=5,
 )
-```
-:::
 
-:::{tab-item} OpenAI API
-
-Use the official OpenAI API:
-
-```python
+# Any OpenAI-compatible endpoint
 client = AsyncOpenAIClient(
-    base_url="https://api.openai.com/v1",
-    api_key="sk-...",  # Or set OPENAI_API_KEY env var
+    base_url="http://your-endpoint/v1",
+    api_key="your-api-key",
     max_concurrent_requests=5,
 )
 ```
-:::
-
-::::
 
 ## Complete Example
 
@@ -277,7 +208,7 @@ If you encounter frequent 429 errors:
 
 ### Connection Timeouts
 
-For large models or slow networks:
+For slow networks or high-latency endpoints:
 ```python
 client = AsyncOpenAIClient(
     base_url="...",
@@ -285,13 +216,6 @@ client = AsyncOpenAIClient(
 )
 ```
 
-### Local Server Issues
-
-If experiencing connection errors with local servers:
-- Check server resource utilization (GPU memory, CPU)
-- Reduce concurrent requests
-- Verify the server is running and accessible
-
 ---
 
 ## Next Steps
diff --git a/docs/curate-text/synthetic/multilingual-qa.md b/docs/curate-text/synthetic/multilingual-qa.md
index e1b72779f2..fa16fb1a64 100644
--- a/docs/curate-text/synthetic/multilingual-qa.md
+++ b/docs/curate-text/synthetic/multilingual-qa.md
@@ -254,7 +254,7 @@ python synthetic_data_generation_example.py \
   - NVIDIA API
   - Base URL for the API endpoint
 * - `--model-name`
-  - llama-3.3-70b
+  - meta/llama-3.3-70b-instruct
   - Model to use for generation
 * - `--languages`
   - EN, FR, DE, ES, IT
diff --git a/docs/curate-text/synthetic/nemotron-cc/index.md b/docs/curate-text/synthetic/nemotron-cc/index.md
index ca73912d07..d5577bc4c7 100644
--- a/docs/curate-text/synthetic/nemotron-cc/index.md
+++ b/docs/curate-text/synthetic/nemotron-cc/index.md
@@ -125,7 +125,7 @@ pipeline.add_stage(
 The recommended approach is to use the helper functions in `nemotron_cc_pipelines.py`:
 
 :::{note}
-The `nemotron_cc_pipelines` helper functions are provided in the [tutorials directory](https://github.com/NVIDIA-NeMo/Curator/blob/main/tutorials/synthetic/nemotron_cc/nemotron_cc_pipelines.py), not as part of the installed package. Copy this file to your project or reference the patterns when building custom pipelines.
+The `nemotron_cc_pipelines` helper functions are provided in the [tutorials directory](https://github.com/NVIDIA-NeMo/Curator/blob/main/tutorials/synthetic/nemotron_cc/nemotron_cc_pipelines.py), not as part of the installed package. Copy the `nemotron_cc_pipelines.py` file to your project or reference the patterns when building custom pipelines.
 :::
 
 ```python
diff --git a/tutorials/synthetic/README.md b/tutorials/synthetic/README.md
index c96a9b895e..19b6b3a026 100644
--- a/tutorials/synthetic/README.md
+++ b/tutorials/synthetic/README.md
@@ -45,19 +45,13 @@ python synthetic_data_generation_example.py \
 ### NemotronCC Pipelines
 
 ```bash
-# Run DiverseQA pipeline with mock data (requires tokenizer access)
+# High-quality processing: Run any task (diverse_qa, distill, extract_knowledge, knowledge_list)
 python nemotron_cc/nemotron_cc_sdg_high_quality_example_pipeline.py \
     --task diverse_qa \
     --tokenizer meta-llama/Llama-3.3-70B-Instruct \
     --mock
 
-# Run Distill pipeline
-python nemotron_cc/nemotron_cc_sdg_high_quality_example_pipeline.py \
-    --task distill \
-    --tokenizer meta-llama/Llama-3.3-70B-Instruct \
-    --mock
-
-# Run Wikipedia Paraphrasing for low-quality data
+# Low-quality processing: Wikipedia-style paraphrasing to improve text quality
 python nemotron_cc/nemotron_cc_sdg_low_quality_example_pipeline.py \
     --tokenizer meta-llama/Llama-3.3-70B-Instruct \
     --mock
@@ -77,27 +71,17 @@ python nemotron_cc/nemotron_cc_sdg_high_quality_example_pipeline.py \
 
 ## Command-Line Arguments
 
-### Common Arguments
+Refer to each script's `--help` output for the complete list of available arguments.
 
 | Argument | Default | Description |
 |----------|---------|-------------|
 | `--api-key` | env var | NVIDIA API key |
 | `--base-url` | NVIDIA API | Base URL for API endpoint |
-| `--model-name` | llama-3.3-70b | Model to use for generation |
+| `--model-name` | meta/llama-3.3-70b-instruct | Model to use for generation |
 | `--output-path` | ./synthetic_output | Output directory |
 | `--max-concurrent-requests` | 3 | Concurrent API requests |
 | `--temperature` | 0.9 (QA) / 0.5 (NemotronCC) | Sampling temperature |
 
-### NemotronCC-Specific Arguments
-
-| Argument | Default | Description |
-|----------|---------|-------------|
-| `--task` | diverse_qa | Task type (diverse_qa, distill, extract_knowledge, knowledge_list) |
-| `--tokenizer` | required | HuggingFace tokenizer name |
-| `--mock` | False | Use built-in test data |
-| `--input-parquet-path` | None | Input Parquet file path/glob |
-| `--output-format` | parquet | Output format (jsonl, parquet) |
-
 ## Example Output
 
 ### Multilingual Q&A
@@ -107,19 +91,9 @@ python nemotron_cc/nemotron_cc_sdg_high_quality_example_pipeline.py \
 {"text": "[FR] Question: Qu'est-ce que la photosynthèse? Answer: La photosynthèse est le processus par lequel les plantes convertissent la lumière du soleil en énergie."}
 ```
 
-### DiverseQA
-
-The output contains the original text followed by generated Q&A pairs:
+### NemotronCC
 
-```text
-The Amazon rainforest contains an unparalleled diversity of plant and animal species...
-
-Question: What makes the Amazon rainforest unique in terms of biodiversity?
-Answer: The Amazon rainforest contains an unparalleled diversity of plant and animal species.
-
-Question: True or False: The Amazon rainforest has limited species diversity.
-Answer: False. The Amazon rainforest contains an unparalleled diversity of species.
-```
+See the [NemotronCC documentation](../../docs/curate-text/synthetic/nemotron-cc/index.md) for output format details for each task type.
 
 ---