Skip to content

fix(classifiers): PII NER allowlist for false positives and metal feature documentation#2541

Merged
bug-ops merged 1 commit intomainfrom
2538-pii-ner-metal-full
Mar 31, 2026
Merged

fix(classifiers): PII NER allowlist for false positives and metal feature documentation#2541
bug-ops merged 1 commit intomainfrom
2538-pii-ner-metal-full

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 31, 2026

Summary

  • Add configurable pii_ner_allowlist to ClassifiersConfig to suppress piiranha NER false positives (e.g. "Zeph" → [PII:CITY])
  • Document that macOS Apple Silicon requires --features full,metal for piiranha GPU acceleration

Changes

  • crates/zeph-config/src/classifiers.rspii_ner_allowlist: Vec<String> field with default entries ["Zeph", "Rust", "OpenAI", "Ollama", "Claude"]; configurable, empty-able; 6 new tests
  • crates/zeph-sanitizer/src/lib.rs — allowlist field on ContentSanitizer, with_pii_ner_allowlist() builder, case-insensitive exact-match filtering in detect_pii(); 6 new unit tests covering all edge cases
  • crates/zeph-core/src/agent/builder.rswith_pii_ner_allowlist() on AgentBuilder
  • src/agent_setup.rs — wires allowlist from config into agent setup pipeline

Test plan

  • cargo nextest run -p zeph-config --features classifiers --lib — 182 passed
  • cargo nextest run -p zeph-sanitizer --features classifiers --lib — 217 passed (6 new)
  • Full workspace: 7667 passed, 0 failed
  • cargo clippy --workspace --features full -- -D warnings — clean
  • cargo +nightly fmt --check — clean

Closes #2537
Closes #2538

…r macOS

Add configurable pii_ner_allowlist to ClassifiersConfig that prevents
tokens matching an allowlist entry (case-insensitive) from being redacted
by the piiranha NER model. Suppresses common false positives such as
"Zeph" being misclassified as [PII:CITY] by piiranha-v1.

Default allowlist entries: ["Zeph", "Rust", "OpenAI", "Ollama", "Claude"].
Configurable via [classifiers] pii_ner_allowlist in config.toml.
Set to [] to disable the allowlist entirely.

Also document that on macOS Apple Silicon, --features full,metal is
required for piiranha NER GPU acceleration. Without metal, the 1.1 GB
model times out after 30s on CPU and falls back to regex-only detection.

Closes #2537, closes #2538
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate bug Something isn't working size/L Large PR (201-500 lines) labels Mar 31, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 31, 2026 18:15
@bug-ops bug-ops merged commit 55b4265 into main Mar 31, 2026
27 checks passed
@bug-ops bug-ops deleted the 2538-pii-ner-metal-full branch March 31, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

1 participant