docs(devnotes): add Introducing NeMo Anonymizer post#157
docs(devnotes): add Introducing NeMo Anonymizer post#157lipikaramaswamy wants to merge 6 commits into
Conversation
First entry in the devnotes blog. Covers why traditional entity-masking falls short in the LLM/agent era, walks through Anonymizer's hybrid detection plus Replace and Rewrite flows, and lays out the three innovations (latent entities, contextual privacy, utility-aware QA). Also wires up the blog plugin authors (Lipika Ramaswamy, Amy Steier) and adds CSS to allow horizontal scrolling on Mermaid diagrams so the Rewrite flowchart renders cleanly on smaller screens.
| @@ -0,0 +1,159 @@ | |||
| --- | |||
| date: | |||
| created: 2026-05-12 | |||
There was a problem hiding this comment.
will have to update this to release day
Greptile SummaryThis PR introduces the first NeMo Anonymizer devnotes post, covering the library's hybrid entity detection and Replace/Rewrite transformation flows, along with accompanying author entries, a hero image, and a CSS fix for Mermaid horizontal scrolling.
Confidence Score: 4/5Safe to merge once the open nav-entry issue from the prior review thread is resolved; all code changes are correct. The code changes are clean and well-reasoned. The nav entry in mkdocs.yml flagged in a prior review thread has not been addressed — it adds the blog post as an explicit nav page, producing a second URL that differs from the canonical slug the blog plugin generates. mkdocs.yml — the nav entry for the blog post has not been updated following prior review feedback. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Source Text] --> B[Hybrid Entity Detection\nNER + LLM Contextual]
B --> C{Flow Choice}
C -->|Replace| D[Select Strategy\nSubstitute / Redact / Annotate / Hash]
D --> E[Apply to Detected Spans\nwith Consistent Replacement Map]
E --> F[Anonymized Text]
C -->|Rewrite| G[Plan Replacements +\nFind Latent Clues]
G --> H[Build Privacy QA +\nQuality QA Harnesses]
H --> I[Generate Rewrite]
I --> J{Attack Pass?}
J -->|fail| K[Repair Using\nAdversarial Explanation]
K --> J
J -->|pass| L[LLM-as-Judge\nFluency + Coherence + Leak Check]
L --> F
Reviews (4): Last reviewed commit: "update header, spacing" | Re-trigger Greptile |
| [^staab2023]: Staab et al., 2023. [*Beyond Memorization: Violating Privacy via Inference with Large Language Models*](https://arxiv.org/abs/2310.07298). ICLR 2024. | ||
| [^ma2025]: Ma et al., 2025. [*SoK: Semantic Privacy in Large Language Models*](https://arxiv.org/abs/2506.23603). | ||
| [^golle2006]: Golle, 2006. [*Revisiting the Uniqueness of Simple Demographics in the US Population*](https://crypto.stanford.edu/~pgolle/papers/census.html). WPES 2006. |
There was a problem hiding this comment.
Three footnotes mix the arxiv preprint year with a later publication venue, making the citation year inconsistent with the listed conference/journal. The most visible case is
nap2-2024, where the key, the author-date, and the arxiv ID all say 2024, but the venue is "EMNLP 2025 Findings". The same pattern occurs in staab2023 (paper dated 2023, venue ICLR 2024) and pilan2024 (paper dated 2024, venue Applied Soft Computing 2025). When a venue is listed, the citation year should match the publication year.
| [^staab2023]: Staab et al., 2023. [*Beyond Memorization: Violating Privacy via Inference with Large Language Models*](https://arxiv.org/abs/2310.07298). ICLR 2024. | |
| [^ma2025]: Ma et al., 2025. [*SoK: Semantic Privacy in Large Language Models*](https://arxiv.org/abs/2506.23603). | |
| [^golle2006]: Golle, 2006. [*Revisiting the Uniqueness of Simple Demographics in the US Population*](https://crypto.stanford.edu/~pgolle/papers/census.html). WPES 2006. | |
| [^staab2023]: Staab et al., 2024. [*Beyond Memorization: Violating Privacy via Inference with Large Language Models*](https://arxiv.org/abs/2310.07298). ICLR 2024. | |
| [^ma2025]: Ma et al., 2025. [*SoK: Semantic Privacy in Large Language Models*](https://arxiv.org/abs/2506.23603). | |
| [^golle2006]: Golle, 2006. [*Revisiting the Uniqueness of Simple Demographics in the US Population*](https://crypto.stanford.edu/~pgolle/papers/census.html). WPES 2006. |
| [^ma2025]: Ma et al., 2025. [*SoK: Semantic Privacy in Large Language Models*](https://arxiv.org/abs/2506.23603). | ||
| [^golle2006]: Golle, 2006. [*Revisiting the Uniqueness of Simple Demographics in the US Population*](https://crypto.stanford.edu/~pgolle/papers/census.html). WPES 2006. | ||
| [^tab2022]: Pilán et al., 2022. [*The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization*](https://aclanthology.org/2022.cl-4.19/). Computational Linguistics 48(4). | ||
| [^nap2-2024]: Huang et al., 2024. [*NAP²: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human*](https://arxiv.org/abs/2406.03749). EMNLP 2025 Findings. |
There was a problem hiding this comment.
The footnote key
nap2-2024 and the author-date both say 2024, but the venue is "EMNLP 2025 Findings". If citing the conference paper, the year should be 2025 to match the venue. (The footnote key can stay as-is to avoid breaking any in-text references.)
| [^nap2-2024]: Huang et al., 2024. [*NAP²: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human*](https://arxiv.org/abs/2406.03749). EMNLP 2025 Findings. | |
| [^nap2-2024]: Huang et al., 2025. [*NAP²: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human*](https://arxiv.org/abs/2406.03749). EMNLP 2025 Findings. |
| [^nap2-2024]: Huang et al., 2024. [*NAP²: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human*](https://arxiv.org/abs/2406.03749). EMNLP 2025 Findings. | ||
| [^synthpai2024]: Yukhymenko et al., 2024. [*A Synthetic Dataset for Personal Attribute Inference*](https://arxiv.org/abs/2406.07217). NeurIPS 2024. | ||
| [^ratbench2026]: Krco et al., 2026. [*RAT-Bench: A Comprehensive Benchmark for Text Anonymization*](https://arxiv.org/abs/2602.12806). | ||
| [^pilan2024]: Pilán et al., 2024. [*Truthful Text Sanitization Guided by Inference Attacks*](https://arxiv.org/abs/2412.12928). Applied Soft Computing 2025. |
There was a problem hiding this comment.
Same citation-year/venue mismatch as
nap2-2024: pilan2024 is dated 2024 in both the key and the author-date, but the venue is "Applied Soft Computing 2025". If citing the journal paper, the year should be 2025.
| [^pilan2024]: Pilán et al., 2024. [*Truthful Text Sanitization Guided by Inference Attacks*](https://arxiv.org/abs/2412.12928). Applied Soft Computing 2025. | |
| [^pilan2024]: Pilán et al., 2025. [*Truthful Text Sanitization Guided by Inference Attacks*](https://arxiv.org/abs/2412.12928). Applied Soft Computing 2025. |
| - API Reference: reference/ | ||
| - Developer Notes: | ||
| - devnotes/index.md | ||
| - "Introducing NeMo Anonymizer": devnotes/posts/anonymizer-intro.md |
There was a problem hiding this comment.
Blog post added directly to nav — URL mismatch with blog plugin
The Material for MkDocs blog plugin (configured with blog_dir: devnotes and post_url_format: "{slug}") processes every file under devnotes/posts/ and publishes each post at a slug-based URL — in this case /devnotes/anonymizer-intro/. Adding the same source file to the nav explicitly creates a second page at /devnotes/posts/anonymizer-intro/. Nav visitors clicking "Introducing NeMo Anonymizer" will land on a raw path that differs from the canonical blog post URL generated by the plugin. The Material docs recommend only listing the blog index (devnotes/index.md) in the nav and letting the plugin attach post navigation automatically underneath it.
Summary
First entry in the NeMo Anonymizer devnotes blog: Introducing NeMo Anonymizer: Text Anonymization for the Reasoning Era.
The post:
preview()quick-start, Anonymizer skill vianpx skills add NVIDIA-NeMo/Anonymizer) and a Resources block linking docs, concepts, and tutorials.Changes
docs/devnotes/posts/anonymizer-intro.md— the postdocs/devnotes/posts/assets/anonymizer-intro-hero.png— hero imagedocs/devnotes/.authors.yml— addslipikaramaswamyandasteier2026(both Researcher at NVIDIA), with avatar + GitHub profile URLdocs/css/style.css— Mermaid horizontal-scroll fix so the Rewrite flowchart renders cleanly on smaller screensReview notes
Drafted as a follow-up to a formal PDF intro; restyled in the punchier DataDesigner devnote voice. Has gone through one round of team feedback already (incorporated). Looking for:
Test plan
make docs-serve— render locally; verify Mermaid diagram, author cards, hero image, and footnotesReplace/Rewriteconcepts, Tutorials index) resolve