chore: upgrade data-designer to 0.6.0 by lipikaramaswamy · Pull Request #160 · NVIDIA-NeMo/Anonymizer

lipikaramaswamy · 2026-05-18T17:03:54Z

Summary

Upgrades data-designer from 0.5.7 → 0.6.0 to pick up the native async engine and bundles a few cleanups that surfaced while verifying the upgrade by re-running the tutorial notebooks end-to-end.

Changes

pyproject.toml + uv.lock: bump data-designer to 0.6.0. No code changes
were required — the async engine is enabled by default in 0.6, so the existing
NddAdapter paths get the benefit transparently.
models.yaml: raise gpt-oss-120b max_parallel_requests from 2 → 16
to actually exercise the new async concurrency. Previously gated the rest of
the pipeline at 2 in-flight LLM calls; bumping cut tutorial notebook runtime
roughly in half on gpt-oss-120b-heavy stages.
Makefile: make convert-notebooks now passes --set-kernel anonymizer-venv
to jupytext. Previously jupytext picked an arbitrary registered kernel
(often a stale one from another project), causing notebooks to silently run
against the wrong Python environment.
llm_replace_workflow.py: replacement-map debug logs no longer emit raw
PII. The previous [DEBUG] Replacement map record … line dumped the full
original → synthetic mapping for every record, leaking the source values
into log streams whenever DEBUG was enabled. Logs now summarize counts per
label plus any anomalies (entities unrequested by the prompt, requested
entities the LLM didn't fill). The warning path for empty/missing maps was
also tightened to always fire (it previously sat under an elif that swallowed it).
docs/notebook_source/*.py: consolidate from anonymizer.config.rewrite import PrivacyGoal
into the top-level from anonymizer import … line. PrivacyGoal is already
re-exported from the package root, so the deeper import was unnecessary and
inconsistent with the bundled skill template.

What's NOT in this PR

Re-rendered notebook outputs (docs/notebooks/*.ipynb) — kept as a separate
step so the dependency change is reviewable on its own.
A separate sensitivity-disposition refactor (schema simplification + prompt
fixes) that surfaced during this verification. Tracking that as a follow-up
branch since it's a larger change with its own design discussion.

Test plan

make test — unit tests pass on data-designer 0.6.0
make convert-notebooks — all five tutorial notebooks execute end-to-end
against gpt-oss-120b and nemotron-30b-thinking
Verified detection + replace + rewrite pipelines all run under the async engine
Confirmed no raw PII in [DEBUG] Replacement map record … log lines
CI green

…ructure

…async-engine

greptile-apps · 2026-05-18T17:06:27Z

Greptile Summary

This PR upgrades data-designer from 0.5.7 to 0.6.0 to pick up the native async engine, and bundles several correctness fixes that surfaced during end-to-end notebook verification.

pyproject.toml / uv.lock: pin data-designer==0.6.0; the async engine is transparent to existing NddAdapter callers.
llm_replace_workflow.py: debug logs no longer emit raw PII — counts + labels only; the empty-map warning is promoted from an elif branch (silently suppressed when DEBUG was enabled) to a standalone if, so it always fires. Regression tests added to test_llm_replace_workflow.py.
models.yaml / Makefile / notebook sources: raise gpt-oss-120b concurrency cap from 2 → 16, pin the jupytext kernel to anonymizer-venv, and consolidate the PrivacyGoal import to the package root (where it is already re-exported).

Confidence Score: 5/5

Safe to merge — all changes are additive or correctness fixes with no breaking surface area.

The dependency bump is a pin-level upgrade with no API changes required in callers. The logging fix removes raw PII from debug output and tightens a previously silenced warning — both are strictly safer than before. The concurrency bump and kernel-pin changes are config-only and exercised by end-to-end notebook runs. The new regression tests directly validate the PII-safe logging invariant.

No files require special attention.

Important Files Changed

Filename	Overview
src/anonymizer/engine/replace/llm_replace_workflow.py	PII-safe debug/warning logging: raw entity values replaced with counts + label counters; `elif` warning promoted to `if` so it fires regardless of log level.
tests/engine/test_llm_replace_workflow.py	Three new regression tests covering happy-path debug log, anomaly summaries, and empty-map warning — all asserting no raw PII appears in log output.
pyproject.toml	Bumps `data-designer` pin from `0.5.7` to `0.6.0`; no other dependency changes.
src/anonymizer/config/default_model_configs/models.yaml	Raises `gpt-oss-120b` `max_parallel_requests` from 2 to 16 to utilize the new async engine concurrency.
Makefile	Adds `--set-kernel anonymizer-venv` to the `jupytext` invocation so notebooks run against the correct Python environment.
docs/notebook_source/04_rewriting_biographies.py	Consolidates `PrivacyGoal` into the top-level `from anonymizer import …` line; `PrivacyGoal` is in `__all__`.
docs/notebook_source/05_rewriting_legal_documents.py	Same `PrivacyGoal` import consolidation as notebook 04.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LLM returns replacement map] --> B[_filter_replacement_map_to_input_entities]
    B --> C{valid dict?}
    C -- no --> D[WARNING: unexpected type\nreturn empty]
    C -- yes --> E[Build allowed_pairs from\nparsed_entities]
    E --> F[Filter: keep only pairs\nin allowed_pairs, deduplicate]
    F --> G{DEBUG enabled?}
    G -- yes --> H[Log counts only:\nrequested / raw / filtered\nunrequested_by_label / unfilled_by_label\nNO raw PII]
    G -- no --> I{filtered empty\nAND allowed_pairs non-empty?}
    H --> I
    I -- yes --> J[WARNING: empty after filtering\ncounts + labels only\nNO raw PII]
    I -- no --> K[Return filtered replacements]
    J --> K

_{Reviews (2): Last reviewed commit: "test: add caplog regression tests for PI..." | Re-trigger Greptile}

andreatgretel

Reviewed the dependency bump, replacement-map logging change, notebook conversion update, and import cleanup. Smoke tests and targeted unit tests pass. Only left one small suggestion to add a committed regression test for the PII-safe debug logging behavior. Otherwise looks good to me!

lipikaramaswamy added 5 commits May 15, 2026 16:05

chore: upgrade data designer to 0.6.x

2db57fc

fix: replacement map debug logs were emitting pii, changed the log st…

ca490c8

…ructure

chore: update imports

fa75371

Merge branch 'main' into lipikaramaswamy/chore/upgrade-data-designer-…

d72ff6a

…async-engine

update venv in makefile, and increase max_parallel_requests for async

c8cc979

lipikaramaswamy requested review from a team as code owners May 18, 2026 17:03

andreatgretel reviewed May 18, 2026

View reviewed changes

Comment thread src/anonymizer/engine/replace/llm_replace_workflow.py

andreatgretel approved these changes May 18, 2026

View reviewed changes

test: add caplog regression tests for PII-free replacement-map logging

bd30625

lipikaramaswamy merged commit 0a94c56 into main May 18, 2026
11 checks passed

lipikaramaswamy deleted the lipikaramaswamy/chore/upgrade-dd-and-cleanup branch May 18, 2026 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: upgrade data-designer to 0.6.0#160

chore: upgrade data-designer to 0.6.0#160
lipikaramaswamy merged 6 commits into
mainfrom
lipikaramaswamy/chore/upgrade-dd-and-cleanup

lipikaramaswamy commented May 18, 2026

Uh oh!

greptile-apps Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

andreatgretel left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lipikaramaswamy commented May 18, 2026

Summary

Changes

What's NOT in this PR

Test plan

Uh oh!

greptile-apps Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

andreatgretel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 18, 2026 •

edited

Loading