Releases: Sushegaad/Semantic-Privacy-Guard
v1.4.0
Changelog — v1.4.0
New Features
Custom Pattern Registry
Callers can now register organisation-specific regex patterns directly on SPGConfig via Builder.addPattern(). Patterns are applied by HeuristicDetector after all built-in rules, so built-in detections always win for overlapping spans. Token counters are document-scoped, producing sequentially numbered tokens ([PII_1], [PII_2], …) across the full input — consistent with the existing behaviour for all other PII types.
SPGConfig.builder()
.addPattern(PIIType.GENERIC_PII, "EMP-\\d{6}", 0.99, "Employee ID")
.addPattern(PIIType.GENERIC_PII, "MRN-[A-Z0-9]{8}", 0.98, "Medical Record Number")
.build();
JSON Redaction (redactJson)
New SemanticPrivacyGuard.redactJson(String) method redacts PII inside JSON documents using Jackson. String values are replaced in-place; keys, numbers, booleans, and arrays are untouched. Returns a StructuredRedactionOutput with the redacted document, a token→original reverse map, and match count. Jackson (jackson-databind) is declared as an optional dependency — a helpful UnsupportedOperationException is thrown at runtime if it is absent.
XML Redaction (redactXml)
New SemanticPrivacyGuard.redactXml(String) method redacts PII inside XML documents using the JDK built-in javax.xml — no additional dependency required. Text nodes and attribute values are replaced in-place; element names and structure are preserved. The document builder is hardened against XXE injection (DOCTYPE declarations and external entity loading disabled).
StructuredRedactionOutput
New value type returned by both structured redaction methods. Exposes getRedactedContent(), getReverseMap(), getMatchCount(), and hasPII().
Improvements
Three-layer pipeline playground
The live playground (docs/index.html) now reflects the full Regex + Naive Bayes + OpenNLP NER pipeline: updated hero text, new Layer 3 pipeline step, NLP sample button, and updated Names (ML+NLP) / Orgs (ML+NLP) pills.
README expanded
New dedicated sections for Custom Pattern Registry and JSON/XML Redaction with full usage examples. API Reference updated with redactJson(), redactXml(), and addPattern(). GENERIC_PII added to the PII types table. Configuration example updated to show .addPattern().
Fixes
XmlRedactorTest.nestedElements_allTextNodesScanned
Test input phone number (555) 123-4567 failed NANP validation because exchange 123 starts with 1 (NANP requires [2-9]XX for both area code and exchange). Updated to (555) 867-5309.
CustomPatternTest.customPattern_analyse_returnsMatches
Input "Contact EMP-555001" caused the ML layer to fire on "EMP" as PERSON_NAME (preceding word "Contact" is a PII context keyword), creating a higher-severity match that overrode the custom GENERIC_PII pattern. Updated to a neutral context phrase that does not trigger the ML gate.
CustomPatternTest.customPattern_employeeId_detected
Input "Assigned to EMP-042731 for review." — the word "to" is a PII context keyword (PII_PREV_KEYWORDS), causing MLDetector to classify EMP-042731 as PERSON_NAME (severity 6), which won the CompositeDetector merge over GENERIC_PII (severity 5), producing [PERSON_NAME_1] instead of [PII_1]. Updated to "Task EMP-042731 is pending.".
Dependencies
| Dependency | Scope | Notes |
|---|---|---|
| jackson-databind:2.17.0 | optional | Required only for redactJson() |
| javax.xml (JDK built-in) | — | Used by redactXml(), no extra dep |
v1.2.0
v1.1.0
v1.0.0 — Initial release
Semantic Privacy Guard: A lightweight, zero-dependency Java middleware that intercepts LLM prompts, identifies PII using a hybrid Regex + Naive Bayes approach, and redacts it before it leaves the corporate network.
v1.0.0 — Initial release
v0.1.0-alpha
new file: .DS_Store modified: README.md modified: docs/index.html new file: docs/playground-screenshot.png modified: target/jacoco.exec modified: target/site/jacoco/jacoco-sessions.html modified: target/site/jacoco/jacoco.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$ApiKeyTests.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$CreditCardTests.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$EmailTests.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$IBANTests.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$IPTests.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$PasswordTests.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$PhoneTests.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$SSNTests.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.MLDetectorTest.xml modified: target/surefire-reports/TEST-com.semanticprivacyguard.SemanticPrivacyGuardTest.xml modified: target/surefire-reports/com.semanticprivacyguard.HeuristicDetectorTest$ApiKeyTests.txt modified: target/surefire-reports/com.semanticprivacyguard.HeuristicDetectorTest$IPTests.txt modified: target/surefire-reports/com.semanticprivacyguard.HeuristicDetectorTest.txt modified: target/surefire-reports/com.semanticprivacyguard.MLDetectorTest.txt modified: target/surefire-reports/com.semanticprivacyguard.SemanticPrivacyGuardTest.txt