Skip to content

Commit f009e31

Browse files
authored
Merge pull request #356 from mckornfield/add-nss-notebooks/mck
Add NeMo Safe-Synthesizer Notebooks
2 parents 2b2bb35 + 7675a9f commit f009e31

File tree

4 files changed

+867
-0
lines changed

4 files changed

+867
-0
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# NeMo Safe Synthesizer Example Notebooks
2+
3+
4+
This directory contains the tutorial notebooks for getting started with NeMo Safe Synthesizer.
5+
6+
## 📦 Set Up the Environment
7+
8+
We will use the `uv` python management tool to set up our environment and install the necessary dependencies. If you don't have `uv` installed, you can follow the installation instructions from the [uv documentation](https://docs.astral.sh/uv/getting-started/installation/).
9+
10+
Install the sdk as follows:
11+
12+
```bash
13+
uv venv
14+
source .venv/bin/activate
15+
uv pip install nemo-microservices[safe-synthesizer]
16+
```
17+
18+
19+
Be sure to select this virtual environment as your kernel when running the notebooks.
20+
21+
## 🚀 Deploying the NeMo Safe Synthesizer Microservice
22+
23+
To run these notebooks, you'll need access to a deployment of the NeMo Safe Synthesizer microservice. You have two deployment options:
24+
25+
26+
### 🐳 Deploy the NeMo Safe Synthesizer Microservice Locally
27+
28+
Follow our quickstart guide to deploy the NeMo safe synthesizer microservice locally via Docker Compose.
29+
30+
### 🚀 Deploy NeMo Microservices Platform with Helm
31+
32+
Follow the helm installation guide to deploy the microservices platform.
Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "630e3e17",
6+
"metadata": {},
7+
"source": [
8+
"# 🔐 NeMo Safe Synthesizer: Advanced Privacy (Differential Privacy)\n",
9+
"\n",
10+
"> ⚠️ **Warning**: NeMo Safe Synthesizer is in Early Access and not recommended for production use.\n",
11+
"\n",
12+
"<br>\n",
13+
"\n",
14+
"In this notebook, we create synthetic tabular data using the NeMo Microservices Python SDK with differential privacy enabled. The notebook should take about 1.5 hours to run.\n",
15+
"\n",
16+
"After completing this notebook, you'll be able to:\n",
17+
"- **Use the NeMo Microservices SDK** to interact with Safe Synthesizer\n",
18+
"- **Enable differential privacy** to provide additional privacy protection\n",
19+
"- **Access an evaluation report** on the quality and privacy of the synthetic data"
20+
]
21+
},
22+
{
23+
"cell_type": "code",
24+
"execution_count": null,
25+
"id": "a538526a",
26+
"metadata": {},
27+
"outputs": [],
28+
"source": []
29+
},
30+
{
31+
"cell_type": "markdown",
32+
"id": "8be84f5d",
33+
"metadata": {},
34+
"source": [
35+
"#### 💾 Install dependencies\n",
36+
"\n",
37+
"Ensure you have a NeMo Microservices Platform deployment available. If you're using a managed or remote deployment, have the correct base URLs and tokens ready."
38+
]
39+
},
40+
{
41+
"cell_type": "code",
42+
"execution_count": null,
43+
"id": "9f5d6f5a",
44+
"metadata": {},
45+
"outputs": [],
46+
"source": [
47+
"import pandas as pd\n",
48+
"from nemo_microservices import NeMoMicroservices\n",
49+
"from nemo_microservices.beta.safe_synthesizer.builder import SafeSynthesizerBuilder\n",
50+
"\n",
51+
"import logging\n",
52+
"\n",
53+
"logging.basicConfig(level=logging.WARNING)\n",
54+
"logging.getLogger(\"httpx\").setLevel(logging.WARNING)"
55+
]
56+
},
57+
{
58+
"cell_type": "markdown",
59+
"id": "7395f0c8",
60+
"metadata": {},
61+
"source": [
62+
"### ⚙️ Initialize the NeMo Safe Synthesizer Client\n",
63+
"\n",
64+
"- The Python SDK provides a wrapper around the NeMo Microservices Platform APIs.\n",
65+
"- `http://localhost:8080` is the default URL for `base_url` in quickstart.\n",
66+
"- If using a managed or remote deployment, ensure you use the correct base URLs and tokens."
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"execution_count": null,
72+
"id": "8c15ab93",
73+
"metadata": {},
74+
"outputs": [],
75+
"source": [
76+
"client = NeMoMicroservices(\n",
77+
" base_url=\"http://localhost:8080\",\n",
78+
")"
79+
]
80+
},
81+
{
82+
"cell_type": "markdown",
83+
"id": "8f1cfb12",
84+
"metadata": {},
85+
"source": [
86+
"NeMo DataStore is launched as one of the services. We'll use it to manage storage, so set the following:"
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"id": "426186a3",
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"datastore_config = {\n",
97+
" \"endpoint\": \"http://localhost:3000/v1/hf\",\n",
98+
" \"token\": \"\",\n",
99+
"}"
100+
]
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"id": "2d66c819",
105+
"metadata": {},
106+
"source": [
107+
"## 📥 Load input data\n",
108+
"\n",
109+
"Safe synthesizer learns the patterns and correlations of an input data set in order to produce synthetic data with similar properties. Use the sample dataset provided or change the following cell to try with your own data.\n",
110+
"\n",
111+
"The sample dataset is of a set of customer default payments. It includes columns of Personally Identifiable Information (PII) such as sex, education level, marriage status, and age. In addition, it contains several billing and payments accounts and a binary indicator of whether the next month's payment would default."
112+
]
113+
},
114+
{
115+
"cell_type": "code",
116+
"execution_count": null,
117+
"id": "9c989a42",
118+
"metadata": {},
119+
"outputs": [],
120+
"source": [
121+
"%pip install ucimlrepo || uv pip install ucimlrepo"
122+
]
123+
},
124+
{
125+
"cell_type": "code",
126+
"execution_count": null,
127+
"id": "7204f213",
128+
"metadata": {},
129+
"outputs": [],
130+
"source": [
131+
"from ucimlrepo import fetch_ucirepo \n",
132+
" \n",
133+
"# fetch dataset \n",
134+
"default_of_credit_card_clients = fetch_ucirepo(id=350) \n",
135+
"df = default_of_credit_card_clients.data.original\n",
136+
" \n",
137+
"\n",
138+
"# Display the first few rows of the combined DataFrame\n",
139+
"print(df.head()) "
140+
]
141+
},
142+
{
143+
"cell_type": "code",
144+
"execution_count": null,
145+
"id": "d8ca3a11",
146+
"metadata": {},
147+
"outputs": [],
148+
"source": [
149+
"df"
150+
]
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"id": "87d72c68",
155+
"metadata": {},
156+
"source": [
157+
"## 🏗️ Create a Safe Synthesizer job\n",
158+
"\n",
159+
"The `SafeSynthesizerBuilder` provides a fluent interface to configure and submit jobs.\n",
160+
"\n",
161+
"This job will:\n",
162+
"- Initialize the builder with the NeMo Microservices client.\n",
163+
"- Use the loaded DataFrame as the input data source.\n",
164+
"- Configure the job to use the specified datastore for model storage.\n",
165+
"- Enable automatic replacement of personally identifiable information (PII).\n",
166+
"- Enable differential privacy (DP) with a configurable epsilon.\n",
167+
"- Use structured generation to enforce the schema during data generation.\n",
168+
"- Submit the job to the microservices platform."
169+
]
170+
},
171+
{
172+
"cell_type": "code",
173+
"execution_count": null,
174+
"id": "85d9de56",
175+
"metadata": {},
176+
"outputs": [],
177+
"source": [
178+
"job = (\n",
179+
" SafeSynthesizerBuilder(client)\n",
180+
" .from_data_source(df)\n",
181+
" .with_datastore(datastore_config)\n",
182+
" .with_replace_pii()\n",
183+
" .with_differential_privacy(dp_enabled=True, epsilon=8.0)\n",
184+
" .with_generate(use_structured_generation=True)\n",
185+
" .create_job()\n",
186+
")\n",
187+
"\n",
188+
"print(f\"job_id = {job.job_id}\")\n",
189+
"job.wait_for_completion()\n",
190+
"\n",
191+
"print(f\"Job finished with status {job.fetch_status()}\")"
192+
]
193+
},
194+
{
195+
"cell_type": "code",
196+
"execution_count": null,
197+
"id": "fa2eacb2",
198+
"metadata": {},
199+
"outputs": [],
200+
"source": [
201+
"# If your notebook shuts down, it's okay, your job is still running on the microservices platform.\n",
202+
"# You can get the same job object and interact with it again by uncommenting the following code\n",
203+
"# snippet, and modifying it with the job id from the previous cell output.\n",
204+
"\n",
205+
"# from nemo_microservices.beta.safe_synthesizer.sdk.job import SafeSynthesizerJob\n",
206+
"# job = SafeSynthesizerJob(job_id=\"<job id>\", client=client)"
207+
]
208+
},
209+
{
210+
"cell_type": "markdown",
211+
"id": "285d4a9d",
212+
"metadata": {},
213+
"source": [
214+
"## 👀 View synthetic data\n",
215+
"\n",
216+
"After the job completes, fetch the generated synthetic dataset."
217+
]
218+
},
219+
{
220+
"cell_type": "code",
221+
"execution_count": null,
222+
"id": "7f25574a",
223+
"metadata": {},
224+
"outputs": [],
225+
"source": [
226+
"# Fetch the synthetic data created by the job\n",
227+
"synthetic_df = job.fetch_data()\n",
228+
"synthetic_df\n"
229+
]
230+
},
231+
{
232+
"cell_type": "markdown",
233+
"id": "472b4f38",
234+
"metadata": {},
235+
"source": [
236+
"## 📊 View evaluation report\n",
237+
"\n",
238+
"An evaluation comparing the synthetic data to the input data is performed automatically.\n",
239+
"\n",
240+
"- Programmatically access key scores (quality and privacy).\n",
241+
"- Download the full HTML report with charts and detailed metrics.\n",
242+
"- Display the report inline below."
243+
]
244+
},
245+
{
246+
"cell_type": "code",
247+
"execution_count": null,
248+
"id": "7b691127",
249+
"metadata": {},
250+
"outputs": [],
251+
"source": [
252+
"# Print selected information from the job summary\n",
253+
"summary = job.fetch_summary()\n",
254+
"print(\n",
255+
" f\"Synthetic data quality score (0-10, higher is better): {summary.synthetic_data_quality_score}\"\n",
256+
")\n",
257+
"print(f\"Data privacy score (0-10, higher is better): {summary.data_privacy_score}\")\n"
258+
]
259+
},
260+
{
261+
"cell_type": "code",
262+
"execution_count": null,
263+
"id": "d5b1030a",
264+
"metadata": {},
265+
"outputs": [],
266+
"source": [
267+
"# Download the full evaluation report to your local machine\n",
268+
"job.save_report(\"evaluation_report.html\")"
269+
]
270+
},
271+
{
272+
"cell_type": "code",
273+
"execution_count": null,
274+
"id": "45f7e22b",
275+
"metadata": {},
276+
"outputs": [],
277+
"source": [
278+
"# Fetch and display the full evaluation report inline\n",
279+
"job.display_report_in_notebook()"
280+
]
281+
}
282+
],
283+
"metadata": {
284+
"kernelspec": {
285+
"display_name": "kendrickb-notebooks",
286+
"language": "python",
287+
"name": "python3"
288+
},
289+
"language_info": {
290+
"codemirror_mode": {
291+
"name": "ipython",
292+
"version": 3
293+
},
294+
"file_extension": ".py",
295+
"mimetype": "text/x-python",
296+
"name": "python",
297+
"nbconvert_exporter": "python",
298+
"pygments_lexer": "ipython3",
299+
"version": "3.11.13"
300+
}
301+
},
302+
"nbformat": 4,
303+
"nbformat_minor": 5
304+
}

0 commit comments

Comments
 (0)