Skip to content

Releases: Sushegaad/Semantic-Privacy-Guard

v1.4.0

12 Apr 03:20

Choose a tag to compare

Changelog — v1.4.0

New Features

Custom Pattern Registry Callers can now register organisation-specific regex patterns directly on SPGConfig via Builder.addPattern(). Patterns are applied by HeuristicDetector after all built-in rules, so built-in detections always win for overlapping spans. Token counters are document-scoped, producing sequentially numbered tokens ([PII_1], [PII_2], …) across the full input — consistent with the existing behaviour for all other PII types.

SPGConfig.builder()
    .addPattern(PIIType.GENERIC_PII, "EMP-\\d{6}",      0.99, "Employee ID")
    .addPattern(PIIType.GENERIC_PII, "MRN-[A-Z0-9]{8}", 0.98, "Medical Record Number")
    .build();

JSON Redaction (redactJson) New SemanticPrivacyGuard.redactJson(String) method redacts PII inside JSON documents using Jackson. String values are replaced in-place; keys, numbers, booleans, and arrays are untouched. Returns a StructuredRedactionOutput with the redacted document, a token→original reverse map, and match count. Jackson (jackson-databind) is declared as an optional dependency — a helpful UnsupportedOperationException is thrown at runtime if it is absent.

XML Redaction (redactXml) New SemanticPrivacyGuard.redactXml(String) method redacts PII inside XML documents using the JDK built-in javax.xml — no additional dependency required. Text nodes and attribute values are replaced in-place; element names and structure are preserved. The document builder is hardened against XXE injection (DOCTYPE declarations and external entity loading disabled).

StructuredRedactionOutput New value type returned by both structured redaction methods. Exposes getRedactedContent(), getReverseMap(), getMatchCount(), and hasPII().


Improvements

Three-layer pipeline playground The live playground (docs/index.html) now reflects the full Regex + Naive Bayes + OpenNLP NER pipeline: updated hero text, new Layer 3 pipeline step, NLP sample button, and updated Names (ML+NLP) / Orgs (ML+NLP) pills.

README expanded New dedicated sections for Custom Pattern Registry and JSON/XML Redaction with full usage examples. API Reference updated with redactJson(), redactXml(), and addPattern(). GENERIC_PII added to the PII types table. Configuration example updated to show .addPattern().


Fixes

XmlRedactorTest.nestedElements_allTextNodesScanned Test input phone number (555) 123-4567 failed NANP validation because exchange 123 starts with 1 (NANP requires [2-9]XX for both area code and exchange). Updated to (555) 867-5309.

CustomPatternTest.customPattern_analyse_returnsMatches Input "Contact EMP-555001" caused the ML layer to fire on "EMP" as PERSON_NAME (preceding word "Contact" is a PII context keyword), creating a higher-severity match that overrode the custom GENERIC_PII pattern. Updated to a neutral context phrase that does not trigger the ML gate.

CustomPatternTest.customPattern_employeeId_detected Input "Assigned to EMP-042731 for review." — the word "to" is a PII context keyword (PII_PREV_KEYWORDS), causing MLDetector to classify EMP-042731 as PERSON_NAME (severity 6), which won the CompositeDetector merge over GENERIC_PII (severity 5), producing [PERSON_NAME_1] instead of [PII_1]. Updated to "Task EMP-042731 is pending.".

Dependencies

Dependency Scope Notes
jackson-databind:2.17.0 optional Required only for redactJson()
javax.xml (JDK built-in) Used by redactXml(), no extra dep

v1.2.0

10 Mar 21:02

Choose a tag to compare

Adding NLP support for better PII redaction, and stream-based processing support.
Correcting the Maven release details.

v1.1.0

10 Mar 19:26

Choose a tag to compare

Adding NLP support for better PII redaction, and stream-based processing support.

v1.0.0 — Initial release

04 Mar 16:56

Choose a tag to compare

Semantic Privacy Guard: A lightweight, zero-dependency Java middleware that intercepts LLM prompts, identifies PII using a hybrid Regex + Naive Bayes approach, and redacts it before it leaves the corporate network.

v1.0.0 — Initial release

v0.1.0-alpha

04 Mar 00:49

Choose a tag to compare

v0.1.0-alpha Pre-release
Pre-release
	new file:   .DS_Store

	modified:   README.md
	modified:   docs/index.html
	new file:   docs/playground-screenshot.png
	modified:   target/jacoco.exec
	modified:   target/site/jacoco/jacoco-sessions.html
	modified:   target/site/jacoco/jacoco.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$ApiKeyTests.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$CreditCardTests.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$EmailTests.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$IBANTests.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$IPTests.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$PasswordTests.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$PhoneTests.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest$SSNTests.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.HeuristicDetectorTest.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.MLDetectorTest.xml
	modified:   target/surefire-reports/TEST-com.semanticprivacyguard.SemanticPrivacyGuardTest.xml
	modified:   target/surefire-reports/com.semanticprivacyguard.HeuristicDetectorTest$ApiKeyTests.txt
	modified:   target/surefire-reports/com.semanticprivacyguard.HeuristicDetectorTest$IPTests.txt
	modified:   target/surefire-reports/com.semanticprivacyguard.HeuristicDetectorTest.txt
	modified:   target/surefire-reports/com.semanticprivacyguard.MLDetectorTest.txt
	modified:   target/surefire-reports/com.semanticprivacyguard.SemanticPrivacyGuardTest.txt