You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ref/checks/pii.md
+36-1Lines changed: 36 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,10 +24,45 @@ Detects personally identifiable information (PII) such as SSNs, phone numbers, c
24
24
25
25
### Parameters
26
26
27
-
-**`entities`** (required): List of PII entity types to detect. See the `PIIEntity` enum in `src/checks/pii.ts` for the full list, including custom entities such as `CVV` (credit card security codes) and `BIC_SWIFT` (bank identification codes).
27
+
-**`entities`** (optional): List of PII entity types to detect. Defaults to all entities except `NRP` and `PERSON` (see note below). See the `PIIEntity` enum in `src/checks/pii.ts` for the full list, including custom entities such as `CVV` (credit card security codes) and `BIC_SWIFT` (bank identification codes).
28
28
-**`block`** (optional): Whether to block content or just mask PII (default: `false`)
29
29
-**`detect_encoded_pii`** (optional): If `true`, detects PII in Base64/URL-encoded/hex strings (default: `false`)
30
30
31
+
### Important: NRP and PERSON Entity Deprecation
32
+
33
+
**As of v0.1.8**, the `NRP` and `PERSON` entities have been **removed from the default entity list** due to their high false positive rates. These patterns are overly broad and cause issues in production:
34
+
35
+
-**`NRP`** matches any two consecutive words (e.g., "nuevo cliente", "crea un", "the user")
36
+
-**`PERSON`** matches any two capitalized words (e.g., "New York", "The User", "European Union")
37
+
38
+
**Impact:**
39
+
- ❌ Causes false positives in natural language conversation
40
+
- ❌ Particularly problematic for non-English languages (Spanish, French, etc.)
41
+
- ❌ Breaks normal text in pre-flight masking mode
42
+
43
+
**Migration Path:**
44
+
45
+
If you need to detect person names or national registration numbers, consider these alternatives:
46
+
47
+
1.**For National Registration Numbers**: Use region-specific patterns instead:
48
+
-`SG_NRIC_FIN` (Singapore)
49
+
-`UK_NINO` (UK National Insurance Number)
50
+
-`FI_PERSONAL_IDENTITY_CODE` (Finland)
51
+
-`KR_RRN` (Korea Resident Registration Number)
52
+
53
+
2.**For Person Names**: Consider using a dedicated NER (Named Entity Recognition) service or LLM-based detection for more accurate results.
54
+
55
+
3.**If you still need these patterns**: You can explicitly include them in your configuration, but be aware of the false positives:
56
+
```json
57
+
{
58
+
"entities": ["NRP", "PERSON", "EMAIL_ADDRESS"],
59
+
"block": false
60
+
}
61
+
```
62
+
A deprecation warning will be logged when these entities are used.
Under the hood the TypeScript guardrail normalizes text (Unicode NFKC), strips zero-width characters, and runs curated regex patterns for each configured entity. When `detect_encoded_pii` is enabled the check also decodes Base64, URL-encoded, and hexadecimal substrings before rescanning them for matches, remapping any findings back to the original encoded content.
0 commit comments