chore: upgrade data-designer to 0.6.0#160
Conversation
Greptile SummaryThis PR upgrades
Confidence Score: 5/5Safe to merge — all changes are additive or correctness fixes with no breaking surface area. The dependency bump is a pin-level upgrade with no API changes required in callers. The logging fix removes raw PII from debug output and tightens a previously silenced warning — both are strictly safer than before. The concurrency bump and kernel-pin changes are config-only and exercised by end-to-end notebook runs. The new regression tests directly validate the PII-safe logging invariant. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[LLM returns replacement map] --> B[_filter_replacement_map_to_input_entities]
B --> C{valid dict?}
C -- no --> D[WARNING: unexpected type\nreturn empty]
C -- yes --> E[Build allowed_pairs from\nparsed_entities]
E --> F[Filter: keep only pairs\nin allowed_pairs, deduplicate]
F --> G{DEBUG enabled?}
G -- yes --> H[Log counts only:\nrequested / raw / filtered\nunrequested_by_label / unfilled_by_label\nNO raw PII]
G -- no --> I{filtered empty\nAND allowed_pairs non-empty?}
H --> I
I -- yes --> J[WARNING: empty after filtering\ncounts + labels only\nNO raw PII]
I -- no --> K[Return filtered replacements]
J --> K
Reviews (2): Last reviewed commit: "test: add caplog regression tests for PI..." | Re-trigger Greptile |
andreatgretel
left a comment
There was a problem hiding this comment.
Reviewed the dependency bump, replacement-map logging change, notebook conversion update, and import cleanup. Smoke tests and targeted unit tests pass. Only left one small suggestion to add a committed regression test for the PII-safe debug logging behavior. Otherwise looks good to me!
Summary
Upgrades
data-designerfrom0.5.7→0.6.0to pick up the native async engine and bundles a few cleanups that surfaced while verifying the upgrade by re-running the tutorial notebooks end-to-end.Changes
pyproject.toml+uv.lock: bumpdata-designerto0.6.0. No code changeswere required — the async engine is enabled by default in 0.6, so the existing
NddAdapterpaths get the benefit transparently.models.yaml: raisegpt-oss-120bmax_parallel_requestsfrom2→16to actually exercise the new async concurrency. Previously gated the rest of
the pipeline at 2 in-flight LLM calls; bumping cut tutorial notebook runtime
roughly in half on
gpt-oss-120b-heavy stages.Makefile:make convert-notebooksnow passes--set-kernel anonymizer-venvto
jupytext. Previouslyjupytextpicked an arbitrary registered kernel(often a stale one from another project), causing notebooks to silently run
against the wrong Python environment.
llm_replace_workflow.py: replacement-map debug logs no longer emit rawPII. The previous
[DEBUG] Replacement map record …line dumped the fulloriginal → syntheticmapping for every record, leaking the source valuesinto log streams whenever DEBUG was enabled. Logs now summarize counts per
label plus any anomalies (entities unrequested by the prompt, requested
entities the LLM didn't fill). The warning path for empty/missing maps was
also tightened to always fire (it previously sat under an
elifthat swallowed it).docs/notebook_source/*.py: consolidatefrom anonymizer.config.rewrite import PrivacyGoalinto the top-level
from anonymizer import …line.PrivacyGoalis alreadyre-exported from the package root, so the deeper import was unnecessary and
inconsistent with the bundled skill template.
What's NOT in this PR
docs/notebooks/*.ipynb) — kept as a separatestep so the dependency change is reviewable on its own.
fixes) that surfaced during this verification. Tracking that as a follow-up
branch since it's a larger change with its own design discussion.
Test plan
make test— unit tests pass ondata-designer 0.6.0make convert-notebooks— all five tutorial notebooks execute end-to-endagainst
gpt-oss-120bandnemotron-30b-thinking[DEBUG] Replacement map record …log lines