Update replace_pii_only.ipynb

alexahaushalter · web-flow · commit ee5938a80ff8 · 2025-10-06T16:35:30.000-05:00
diff --git a/nemo/NeMo-Safe-Synthesizer/advanced/replace_pii_only.ipynb b/nemo/NeMo-Safe-Synthesizer/advanced/replace_pii_only.ipynb
@@ -97,7 +97,7 @@
     "\n",
     "Safe Synthesizer processes your input dataset and returns the same rows with PII replaced. For this tutorial we load a small public sample dataset. Replace it with your own data if desired.\n",
     "\n",
-    "The dolly dataset is an open source dataset of instruction-following records. Each record contains (1) a free text prompt that could be sent to an LLM, (2) a context descriptions to help the LLM determine the answer, (3) a response that could come from the LLM, and (4) the instruction category such as classification, open QA, closed QA, information extraction, and brainstroming."
+    "The dolly dataset is an open source dataset of instruction-following records. Each record contains (1) a free text prompt that could be sent to an LLM, (2) a context descriptions to help the LLM determine the answer, (3) a response that could come from the LLM, and (4) the instruction category such as classification, open QA, closed QA, information extraction, and brainstorming. The text in each of the first three fields sometimes contains Personally Identifiable Information, such as names, birth dates, and locations."
    ]
   },
   {

Original file line number	Diff line number	Diff line change
`@@ -97,7 +97,7 @@`
`97`	`97`	`"\n",`
`98`	`98`	`"Safe Synthesizer processes your input dataset and returns the same rows with PII replaced. For this tutorial we load a small public sample dataset. Replace it with your own data if desired.\n",`
`99`	`99`	`"\n",`
`100`		`- "The dolly dataset is an open source dataset of instruction-following records. Each record contains (1) a free text prompt that could be sent to an LLM, (2) a context descriptions to help the LLM determine the answer, (3) a response that could come from the LLM, and (4) the instruction category such as classification, open QA, closed QA, information extraction, and brainstroming."`
	`100`	+ "The dolly dataset is an open source dataset of instruction-following records. Each record contains (1) a free text prompt that could be sent to an LLM, (2) a context descriptions to help the LLM determine the answer, (3) a response that could come from the LLM, and (4) the instruction category such as classification, open QA, closed QA, information extraction, and brainstorming. The text in each of the first three fields sometimes contains Personally Identifiable Information, such as names, birth dates, and locations."
`101`	`101`	`]`
`102`	`102`	`},`
`103`	`103`	`{`