Skip to content

Commit 7ad9d85

Browse files
committed
docs: Document NRP/PERSON deprecation
1 parent 5a1e171 commit 7ad9d85

File tree

1 file changed

+36
-1
lines changed

1 file changed

+36
-1
lines changed

docs/ref/checks/pii.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,45 @@ Detects personally identifiable information (PII) such as SSNs, phone numbers, c
2424

2525
### Parameters
2626

27-
- **`entities`** (required): List of PII entity types to detect. See the `PIIEntity` enum in `src/checks/pii.ts` for the full list, including custom entities such as `CVV` (credit card security codes) and `BIC_SWIFT` (bank identification codes).
27+
- **`entities`** (optional): List of PII entity types to detect. Defaults to all entities except `NRP` and `PERSON` (see note below). See the `PIIEntity` enum in `src/checks/pii.ts` for the full list, including custom entities such as `CVV` (credit card security codes) and `BIC_SWIFT` (bank identification codes).
2828
- **`block`** (optional): Whether to block content or just mask PII (default: `false`)
2929
- **`detect_encoded_pii`** (optional): If `true`, detects PII in Base64/URL-encoded/hex strings (default: `false`)
3030

31+
### Important: NRP and PERSON Entity Deprecation
32+
33+
**As of v0.1.8**, the `NRP` and `PERSON` entities have been **removed from the default entity list** due to their high false positive rates. These patterns are overly broad and cause issues in production:
34+
35+
- **`NRP`** matches any two consecutive words (e.g., "nuevo cliente", "crea un", "the user")
36+
- **`PERSON`** matches any two capitalized words (e.g., "New York", "The User", "European Union")
37+
38+
**Impact:**
39+
- ❌ Causes false positives in natural language conversation
40+
- ❌ Particularly problematic for non-English languages (Spanish, French, etc.)
41+
- ❌ Breaks normal text in pre-flight masking mode
42+
43+
**Migration Path:**
44+
45+
If you need to detect person names or national registration numbers, consider these alternatives:
46+
47+
1. **For National Registration Numbers**: Use region-specific patterns instead:
48+
- `SG_NRIC_FIN` (Singapore)
49+
- `UK_NINO` (UK National Insurance Number)
50+
- `FI_PERSONAL_IDENTITY_CODE` (Finland)
51+
- `KR_RRN` (Korea Resident Registration Number)
52+
53+
2. **For Person Names**: Consider using a dedicated NER (Named Entity Recognition) service or LLM-based detection for more accurate results.
54+
55+
3. **If you still need these patterns**: You can explicitly include them in your configuration, but be aware of the false positives:
56+
```json
57+
{
58+
"entities": ["NRP", "PERSON", "EMAIL_ADDRESS"],
59+
"block": false
60+
}
61+
```
62+
A deprecation warning will be logged when these entities are used.
63+
64+
**Reference:** [Issue #47](https://github.com/openai/openai-guardrails-js/issues/47)
65+
3166
## Implementation Notes
3267

3368
Under the hood the TypeScript guardrail normalizes text (Unicode NFKC), strips zero-width characters, and runs curated regex patterns for each configured entity. When `detect_encoded_pii` is enabled the check also decodes Base64, URL-encoded, and hexadecimal substrings before rescanning them for matches, remapping any findings back to the original encoded content.

0 commit comments

Comments
 (0)