Skip to content

docs(devnotes): add Introducing NeMo Anonymizer post#157

Open
lipikaramaswamy wants to merge 6 commits into
mainfrom
lramaswamy/docs/anonymizer-intro-devnote
Open

docs(devnotes): add Introducing NeMo Anonymizer post#157
lipikaramaswamy wants to merge 6 commits into
mainfrom
lramaswamy/docs/anonymizer-intro-devnote

Conversation

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

Summary

First entry in the NeMo Anonymizer devnotes blog: Introducing NeMo Anonymizer: Text Anonymization for the Reasoning Era.

The post:

  • Frames the privacy problem text anonymization actually has to solve in the LLM/agent era — quasi-identifiers + cheap inference + linkage via tool-using agents make traditional entity-masking insufficient.
  • Walks through Anonymizer's architecture: hybrid entity detection (NER + LLM contextual) plus two transformation flows — Replace (span-level, four strategies) and Rewrite (full contextual, with attack/repair/judge loop).
  • Lays out the three innovations: latent entities are the real adversary, privacy is contextual not categorical, and privacy without utility is privacy theater.
  • Closes with a Try-It CTA (preview() quick-start, Anonymizer skill via npx skills add NVIDIA-NeMo/Anonymizer) and a Resources block linking docs, concepts, and tutorials.

Changes

  • docs/devnotes/posts/anonymizer-intro.md — the post
  • docs/devnotes/posts/assets/anonymizer-intro-hero.png — hero image
  • docs/devnotes/.authors.yml — adds lipikaramaswamy and asteier2026 (both Researcher at NVIDIA), with avatar + GitHub profile URL
  • docs/css/style.css — Mermaid horizontal-scroll fix so the Rewrite flowchart renders cleanly on smaller screens

Review notes

Drafted as a follow-up to a formal PDF intro; restyled in the punchier DataDesigner devnote voice. Has gone through one round of team feedback already (incorporated). Looking for:

  • Voice/clarity nits
  • Accuracy of pipeline description (especially the Rewrite flowchart steps and the LLM-as-Judge framing)
  • Citation/link correctness

Test plan

  • make docs-serve — render locally; verify Mermaid diagram, author cards, hero image, and footnotes
  • Light pass on dark/light mode rendering
  • Confirm internal links (Replace/Rewrite concepts, Tutorials index) resolve

First entry in the devnotes blog. Covers why traditional entity-masking
falls short in the LLM/agent era, walks through Anonymizer's hybrid
detection plus Replace and Rewrite flows, and lays out the three
innovations (latent entities, contextual privacy, utility-aware QA).

Also wires up the blog plugin authors (Lipika Ramaswamy, Amy Steier)
and adds CSS to allow horizontal scrolling on Mermaid diagrams so the
Rewrite flowchart renders cleanly on smaller screens.
@lipikaramaswamy lipikaramaswamy marked this pull request as ready for review May 13, 2026 22:40
@lipikaramaswamy lipikaramaswamy requested a review from a team as a code owner May 13, 2026 22:40
Comment thread docs/devnotes/posts/anonymizer-intro.md Outdated
@@ -0,0 +1,159 @@
---
date:
created: 2026-05-12
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will have to update this to release day

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Greptile Summary

This PR introduces the first NeMo Anonymizer devnotes post, covering the library's hybrid entity detection and Replace/Rewrite transformation flows, along with accompanying author entries, a hero image, and a CSS fix for Mermaid horizontal scrolling.

  • docs/devnotes/posts/anonymizer-intro.md — New 161-line blog post framing the privacy reasoning problem, walking through both pipelines, and closing with a Try-It CTA and footnoted references.
  • docs/css/style.css — Global Mermaid scrolling fix (overflow-x: auto + max-width: none !important on the SVG) to prevent the Rewrite flowchart from overflowing on narrow viewports; since this is currently the only Mermaid diagram in the docs, the global scope has no unintended side effects.
  • docs/devnotes/.authors.yml — Adds authors: top-level key (absent before this PR) with entries for both post authors in the format expected by the Material blog plugin.

Confidence Score: 4/5

Safe to merge once the open nav-entry issue from the prior review thread is resolved; all code changes are correct.

The code changes are clean and well-reasoned. The nav entry in mkdocs.yml flagged in a prior review thread has not been addressed — it adds the blog post as an explicit nav page, producing a second URL that differs from the canonical slug the blog plugin generates.

mkdocs.yml — the nav entry for the blog post has not been updated following prior review feedback.

Important Files Changed

Filename Overview
docs/devnotes/posts/anonymizer-intro.md New blog post; citation year/venue mismatches flagged in prior review threads remain open. Content and pipeline description are accurate.
docs/css/style.css Adds Mermaid horizontal-scroll fix; !important is intentional and necessary to override Mermaid inline width styles; safe since this is the only Mermaid diagram in the docs.
docs/devnotes/.authors.yml Correctly adds the top-level authors: key and two author entries in the format expected by the Material blog plugin.
mkdocs.yml Nav entry for the blog post creates a second page at a raw path differing from the canonical slug-based URL generated by the blog plugin (flagged in prior review thread).
docs/devnotes/posts/assets/anonymizer-intro-hero.png Binary hero image added; referenced correctly via relative path in the post.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Source Text] --> B[Hybrid Entity Detection\nNER + LLM Contextual]
    B --> C{Flow Choice}
    C -->|Replace| D[Select Strategy\nSubstitute / Redact / Annotate / Hash]
    D --> E[Apply to Detected Spans\nwith Consistent Replacement Map]
    E --> F[Anonymized Text]
    C -->|Rewrite| G[Plan Replacements +\nFind Latent Clues]
    G --> H[Build Privacy QA +\nQuality QA Harnesses]
    H --> I[Generate Rewrite]
    I --> J{Attack Pass?}
    J -->|fail| K[Repair Using\nAdversarial Explanation]
    K --> J
    J -->|pass| L[LLM-as-Judge\nFluency + Coherence + Leak Check]
    L --> F
Loading

Reviews (4): Last reviewed commit: "update header, spacing" | Re-trigger Greptile

Comment on lines +149 to +151
[^staab2023]: Staab et al., 2023. [*Beyond Memorization: Violating Privacy via Inference with Large Language Models*](https://arxiv.org/abs/2310.07298). ICLR 2024.
[^ma2025]: Ma et al., 2025. [*SoK: Semantic Privacy in Large Language Models*](https://arxiv.org/abs/2506.23603).
[^golle2006]: Golle, 2006. [*Revisiting the Uniqueness of Simple Demographics in the US Population*](https://crypto.stanford.edu/~pgolle/papers/census.html). WPES 2006.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Three footnotes mix the arxiv preprint year with a later publication venue, making the citation year inconsistent with the listed conference/journal. The most visible case is nap2-2024, where the key, the author-date, and the arxiv ID all say 2024, but the venue is "EMNLP 2025 Findings". The same pattern occurs in staab2023 (paper dated 2023, venue ICLR 2024) and pilan2024 (paper dated 2024, venue Applied Soft Computing 2025). When a venue is listed, the citation year should match the publication year.

Suggested change
[^staab2023]: Staab et al., 2023. [*Beyond Memorization: Violating Privacy via Inference with Large Language Models*](https://arxiv.org/abs/2310.07298). ICLR 2024.
[^ma2025]: Ma et al., 2025. [*SoK: Semantic Privacy in Large Language Models*](https://arxiv.org/abs/2506.23603).
[^golle2006]: Golle, 2006. [*Revisiting the Uniqueness of Simple Demographics in the US Population*](https://crypto.stanford.edu/~pgolle/papers/census.html). WPES 2006.
[^staab2023]: Staab et al., 2024. [*Beyond Memorization: Violating Privacy via Inference with Large Language Models*](https://arxiv.org/abs/2310.07298). ICLR 2024.
[^ma2025]: Ma et al., 2025. [*SoK: Semantic Privacy in Large Language Models*](https://arxiv.org/abs/2506.23603).
[^golle2006]: Golle, 2006. [*Revisiting the Uniqueness of Simple Demographics in the US Population*](https://crypto.stanford.edu/~pgolle/papers/census.html). WPES 2006.

[^ma2025]: Ma et al., 2025. [*SoK: Semantic Privacy in Large Language Models*](https://arxiv.org/abs/2506.23603).
[^golle2006]: Golle, 2006. [*Revisiting the Uniqueness of Simple Demographics in the US Population*](https://crypto.stanford.edu/~pgolle/papers/census.html). WPES 2006.
[^tab2022]: Pilán et al., 2022. [*The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization*](https://aclanthology.org/2022.cl-4.19/). Computational Linguistics 48(4).
[^nap2-2024]: Huang et al., 2024. [*NAP²: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human*](https://arxiv.org/abs/2406.03749). EMNLP 2025 Findings.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The footnote key nap2-2024 and the author-date both say 2024, but the venue is "EMNLP 2025 Findings". If citing the conference paper, the year should be 2025 to match the venue. (The footnote key can stay as-is to avoid breaking any in-text references.)

Suggested change
[^nap2-2024]: Huang et al., 2024. [*NAP²: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human*](https://arxiv.org/abs/2406.03749). EMNLP 2025 Findings.
[^nap2-2024]: Huang et al., 2025. [*NAP²: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human*](https://arxiv.org/abs/2406.03749). EMNLP 2025 Findings.

[^nap2-2024]: Huang et al., 2024. [*NAP²: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human*](https://arxiv.org/abs/2406.03749). EMNLP 2025 Findings.
[^synthpai2024]: Yukhymenko et al., 2024. [*A Synthetic Dataset for Personal Attribute Inference*](https://arxiv.org/abs/2406.07217). NeurIPS 2024.
[^ratbench2026]: Krco et al., 2026. [*RAT-Bench: A Comprehensive Benchmark for Text Anonymization*](https://arxiv.org/abs/2602.12806).
[^pilan2024]: Pilán et al., 2024. [*Truthful Text Sanitization Guided by Inference Attacks*](https://arxiv.org/abs/2412.12928). Applied Soft Computing 2025.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same citation-year/venue mismatch as nap2-2024: pilan2024 is dated 2024 in both the key and the author-date, but the venue is "Applied Soft Computing 2025". If citing the journal paper, the year should be 2025.

Suggested change
[^pilan2024]: Pilán et al., 2024. [*Truthful Text Sanitization Guided by Inference Attacks*](https://arxiv.org/abs/2412.12928). Applied Soft Computing 2025.
[^pilan2024]: Pilán et al., 2025. [*Truthful Text Sanitization Guided by Inference Attacks*](https://arxiv.org/abs/2412.12928). Applied Soft Computing 2025.

@lipikaramaswamy lipikaramaswamy requested a review from a team as a code owner May 13, 2026 22:53
Comment thread mkdocs.yml
- API Reference: reference/
- Developer Notes:
- devnotes/index.md
- "Introducing NeMo Anonymizer": devnotes/posts/anonymizer-intro.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Blog post added directly to nav — URL mismatch with blog plugin

The Material for MkDocs blog plugin (configured with blog_dir: devnotes and post_url_format: "{slug}") processes every file under devnotes/posts/ and publishes each post at a slug-based URL — in this case /devnotes/anonymizer-intro/. Adding the same source file to the nav explicitly creates a second page at /devnotes/posts/anonymizer-intro/. Nav visitors clicking "Introducing NeMo Anonymizer" will land on a raw path that differs from the canonical blog post URL generated by the plugin. The Material docs recommend only listing the blog index (devnotes/index.md) in the nav and letting the plugin attach post navigation automatically underneath it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant