NVIDIA-NeMo
diff --git a/‎tutorials/text/distributed-data-classification/aegis-classification.ipynb‎
Lines changed: 3 additions & 3 deletions b/‎tutorials/text/distributed-data-classification/aegis-classification.ipynb‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎tutorials/text/distributed-data-classification/content-type-classification.ipynb‎
Lines changed: 205 additions & 91 deletions b/‎tutorials/text/distributed-data-classification/content-type-classification.ipynb‎
Lines changed: 205 additions & 91 deletions
diff --git a/‎tutorials/text/distributed-data-classification/domain-classification.ipynb‎
Lines changed: 208 additions & 65 deletions b/‎tutorials/text/distributed-data-classification/domain-classification.ipynb‎
Lines changed: 208 additions & 65 deletions
diff --git a/‎tutorials/text/distributed-data-classification/fineweb-edu-classification.ipynb‎
Lines changed: 188 additions & 58 deletions b/‎tutorials/text/distributed-data-classification/fineweb-edu-classification.ipynb‎
Lines changed: 188 additions & 58 deletions
diff --git a/‎tutorials/text/distributed-data-classification/fineweb-mixtral-edu-classification.ipynb‎
Lines changed: 196 additions & 64 deletions b/‎tutorials/text/distributed-data-classification/fineweb-mixtral-edu-classification.ipynb‎
Lines changed: 196 additions & 64 deletions
diff --git a/‎tutorials/text/distributed-data-classification/fineweb-nemotron-edu-classification.ipynb‎
Lines changed: 193 additions & 62 deletions b/‎tutorials/text/distributed-data-classification/fineweb-nemotron-edu-classification.ipynb‎
Lines changed: 193 additions & 62 deletions
diff --git a/‎tutorials/text/distributed-data-classification/instruction-data-guard-classification.ipynb‎
Lines changed: 3 additions & 3 deletions b/‎tutorials/text/distributed-data-classification/instruction-data-guard-classification.ipynb‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎tutorials/text/distributed-data-classification/multilingual-domain-classification.ipynb‎
Lines changed: 203 additions & 75 deletions b/‎tutorials/text/distributed-data-classification/multilingual-domain-classification.ipynb‎
Lines changed: 203 additions & 75 deletions
diff --git a/‎tutorials/text/distributed-data-classification/prompt-task-complexity-classification.ipynb‎
Lines changed: 212 additions & 107 deletions b/‎tutorials/text/distributed-data-classification/prompt-task-complexity-classification.ipynb‎
Lines changed: 212 additions & 107 deletions
@@ -263,7 +263,7 @@
     "\n",
     "```python\n",
     "pipeline = Pipeline(name=\"full_pipeline\")\n",
-    "pipeline.add_stage(read_stage)                # reader (JSONL/S3/etc.)\n",
+    "pipeline.add_stage(read_stage)                # reader (JSONL/Parquet)\n",
     "pipeline.add_stage(lang_id_stage)             # optional: language filter\n",
     "pipeline.add_stage(classifier_stage)          # classifier\n",
     "pipeline.add_stage(write_stage)               # writer (JSONL/Parquet)\n",
@@ -288,7 +288,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Since the pipeline ran to completion and the result was written to a JSONL file, we can shut down the Ray cluster with:"
+    "Since the pipeline ran to completion and the result was written to disk, we can shut down the Ray cluster with:"
    ]
   },
   {
@@ -388,7 +388,7 @@
    ],
    "source": [
     "# For simplicity, we take the first written file from the writer stage\n",
-    "# In real pipelines, the writer may return multiple files (shards) or objects\n",
+    "# In real pipelines, adjust as needed\n",
     "result_file = result[0].data[0]\n",
     "\n",
     "result_df = pd.read_json(result_file, lines=True)\n",
 
@@ -249,7 +249,7 @@
     "\n",
     "```python\n",
     "pipeline = Pipeline(name=\"full_pipeline\")\n",
-    "pipeline.add_stage(read_stage)                # reader (JSONL/S3/etc.)\n",
+    "pipeline.add_stage(read_stage)                # reader (JSONL/Parquet)\n",
     "pipeline.add_stage(lang_id_stage)             # optional: language filter\n",
     "pipeline.add_stage(classifier_stage)          # classifier\n",
     "pipeline.add_stage(write_stage)               # writer (JSONL/Parquet)\n",
@@ -274,7 +274,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Since the pipeline ran to completion and the result was written to a JSONL file, we can shut down the Ray cluster with:"
+    "Since the pipeline ran to completion and the result was written to disk, we can shut down the Ray cluster with:"
    ]
   },
   {
@@ -355,7 +355,7 @@
    ],
    "source": [
     "# For simplicity, we take the first written file from the writer stage\n",
-    "# In real pipelines, the writer may return multiple files (shards) or objects\n",
+    "# In real pipelines, adjust as needed\n",
     "result_file = result[0].data[0]\n",
     "\n",
     "result_df = pd.read_json(result_file, lines=True)\n",