Skip to content

Commit d2076a9

Browse files
fix: Remove overly broad NRP and PERSON entities from default PII detection (#47) (#51)
* fix: Remove NRP and PERSON from default PII entities * test: Add tests for NRP/PERSON deprecation * docs: Document NRP/PERSON deprecation * Update version number, typing, md comment * Shorten warning message
1 parent d77409c commit d2076a9

File tree

3 files changed

+244
-4
lines changed

3 files changed

+244
-4
lines changed

docs/ref/checks/pii.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,48 @@ Detects personally identifiable information (PII) such as SSNs, phone numbers, c
2424

2525
### Parameters
2626

27-
- **`entities`** (required): List of PII entity types to detect. See the `PIIEntity` enum in `src/checks/pii.ts` for the full list, including custom entities such as `CVV` (credit card security codes) and `BIC_SWIFT` (bank identification codes).
27+
- **`entities`** (optional): List of PII entity types to detect. Defaults to all entities except `NRP` and `PERSON` (see note below). See the `PIIEntity` enum in `src/checks/pii.ts` for the full list, including custom entities such as `CVV` (credit card security codes) and `BIC_SWIFT` (bank identification codes).
2828
- **`block`** (optional): Whether to block content or just mask PII (default: `false`)
2929
- **`detect_encoded_pii`** (optional): If `true`, detects PII in Base64/URL-encoded/hex strings (default: `false`)
3030

31+
### Important: NRP and PERSON Entity Deprecation
32+
33+
**As of v0.2.0**, the `NRP` and `PERSON` entities have been **removed from the default entity list** due to their high false positive rates. These patterns are overly broad and cause issues in production:
34+
35+
- **`NRP`** matches any two consecutive words (e.g., "nuevo cliente", "crea un", "the user")
36+
- **`PERSON`** matches any two capitalized words (e.g., "New York", "The User", "European Union")
37+
38+
**Impact:**
39+
40+
- ❌ Causes false positives in natural language conversation
41+
- ❌ Particularly problematic for non-English languages (Spanish, French, etc.)
42+
- ❌ Breaks normal text in pre-flight masking mode
43+
44+
> **Future Improvement:** More robust implementations of `NRP` and `PERSON` detection are planned for a future release. Stay tuned for updates.
45+
46+
**Migration Path:**
47+
48+
If you need to detect person names or national registration numbers, consider these alternatives:
49+
50+
1. **For National Registration Numbers**: Use region-specific patterns instead:
51+
- `SG_NRIC_FIN` (Singapore)
52+
- `UK_NINO` (UK National Insurance Number)
53+
- `FI_PERSONAL_IDENTITY_CODE` (Finland)
54+
- `KR_RRN` (Korea Resident Registration Number)
55+
56+
2. **For Person Names**: Consider using a dedicated NER (Named Entity Recognition) service or LLM-based detection for more accurate results.
57+
58+
3. **If you still need these patterns**: You can explicitly include them in your configuration, but be aware of the false positives:
59+
```json
60+
{
61+
"entities": ["NRP", "PERSON", "EMAIL_ADDRESS"],
62+
"block": false
63+
}
64+
```
65+
A deprecation warning will be logged when these entities are used.
66+
67+
**Reference:** [Issue #47](https://github.com/openai/openai-guardrails-js/issues/47)
68+
3169
## Implementation Notes
3270

3371
Under the hood the TypeScript guardrail normalizes text (Unicode NFKC), strips zero-width characters, and runs curated regex patterns for each configured entity. When `detect_encoded_pii` is enabled the check also decodes Base64, URL-encoded, and hexadecimal substrings before rescanning them for matches, remapping any findings back to the original encoded content.

src/__tests__/unit/checks/pii.test.ts

Lines changed: 139 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
* Unit tests for the PII guardrail functionality.
33
*/
44

5-
import { describe, it, expect } from 'vitest';
6-
import { pii, PIIConfig, PIIEntity } from '../../../checks/pii';
5+
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
6+
import { pii, PIIConfig, PIIEntity, _clearDeprecationWarnings } from '../../../checks/pii';
77

88
describe('pii guardrail', () => {
99
it('masks detected PII when block=false', async () => {
@@ -286,4 +286,141 @@ describe('pii guardrail', () => {
286286
);
287287
expect(result.info?.checked_text).toBe('Ship to <LOCATION> for delivery.');
288288
});
289+
290+
describe('NRP and PERSON deprecation (Issue #47)', () => {
291+
beforeEach(() => {
292+
// Clear deprecation warnings before each test to ensure clean state
293+
_clearDeprecationWarnings();
294+
});
295+
296+
afterEach(() => {
297+
// Restore all mocks to prevent leaking between tests
298+
vi.restoreAllMocks();
299+
});
300+
301+
it('excludes NRP and PERSON from default entities', () => {
302+
const config = PIIConfig.parse({});
303+
304+
expect(config.entities).not.toContain(PIIEntity.NRP);
305+
expect(config.entities).not.toContain(PIIEntity.PERSON);
306+
});
307+
308+
it('does not mask common two-word phrases when using defaults', async () => {
309+
const config = PIIConfig.parse({
310+
block: false,
311+
});
312+
const text = 'crea un nuevo cliente con email [email protected]';
313+
314+
const result = await pii({}, text, config);
315+
316+
// Should only mask the email, not "crea un" or "nuevo cliente"
317+
expect(result.info?.checked_text).toBe('crea un nuevo cliente con email <EMAIL_ADDRESS>');
318+
expect((result.info?.detected_entities as Record<string, string[]>)?.NRP).toBeUndefined();
319+
});
320+
321+
it('does not mask capitalized phrases when using defaults', async () => {
322+
const config = PIIConfig.parse({
323+
block: false,
324+
});
325+
const text = 'Welcome to New York, The User can access the system.';
326+
327+
const result = await pii({}, text, config);
328+
329+
// Should not mask "New York" or "The User"
330+
expect(result.info?.checked_text).toBe('Welcome to New York, The User can access the system.');
331+
expect((result.info?.detected_entities as Record<string, string[]>)?.PERSON).toBeUndefined();
332+
});
333+
334+
it('still detects NRP when explicitly configured', async () => {
335+
const consoleWarnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
336+
337+
const config = PIIConfig.parse({
338+
entities: [PIIEntity.NRP],
339+
block: false,
340+
});
341+
const text = 'hello world';
342+
343+
const result = await pii({}, text, config);
344+
345+
expect((result.info?.detected_entities as Record<string, string[]>)?.NRP).toEqual(['hello world']);
346+
expect(result.info?.checked_text).toBe('<NRP>');
347+
348+
consoleWarnSpy.mockRestore();
349+
});
350+
351+
it('still detects PERSON when explicitly configured', async () => {
352+
const consoleWarnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
353+
354+
const config = PIIConfig.parse({
355+
entities: [PIIEntity.PERSON],
356+
block: false,
357+
});
358+
const text = 'John Smith lives in New York';
359+
360+
const result = await pii({}, text, config);
361+
362+
expect((result.info?.detected_entities as Record<string, string[]>)?.PERSON).toContain('John Smith');
363+
expect((result.info?.detected_entities as Record<string, string[]>)?.PERSON).toContain('New York');
364+
365+
consoleWarnSpy.mockRestore();
366+
});
367+
368+
it('shows deprecation warning for NRP', async () => {
369+
const consoleWarnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
370+
371+
const config = PIIConfig.parse({
372+
entities: [PIIEntity.NRP],
373+
block: false,
374+
});
375+
376+
await pii({}, 'test data', config);
377+
378+
expect(consoleWarnSpy).toHaveBeenCalledWith(
379+
expect.stringContaining('DEPRECATION: PIIEntity.NRP')
380+
);
381+
expect(consoleWarnSpy).toHaveBeenCalledWith(
382+
expect.stringContaining('more robust implementation')
383+
);
384+
385+
consoleWarnSpy.mockRestore();
386+
});
387+
388+
it('shows deprecation warning for PERSON', async () => {
389+
const consoleWarnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
390+
391+
const config = PIIConfig.parse({
392+
entities: [PIIEntity.PERSON],
393+
block: false,
394+
});
395+
396+
await pii({}, 'test data', config);
397+
398+
expect(consoleWarnSpy).toHaveBeenCalledWith(
399+
expect.stringContaining('DEPRECATION: PIIEntity.PERSON')
400+
);
401+
expect(consoleWarnSpy).toHaveBeenCalledWith(
402+
expect.stringContaining('more robust implementation')
403+
);
404+
405+
consoleWarnSpy.mockRestore();
406+
});
407+
408+
it('only shows deprecation warning once per entity', async () => {
409+
const consoleWarnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
410+
411+
const config = PIIConfig.parse({
412+
entities: [PIIEntity.NRP, PIIEntity.PERSON],
413+
block: false,
414+
});
415+
416+
await pii({}, 'test data', config);
417+
await pii({}, 'more test data', config);
418+
await pii({}, 'even more data', config);
419+
420+
// Should only be called once for each entity (2 total)
421+
expect(consoleWarnSpy).toHaveBeenCalledTimes(2);
422+
423+
consoleWarnSpy.mockRestore();
424+
});
425+
});
289426
});

src/checks/pii.ts

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,24 @@ export enum PIIEntity {
145145
*
146146
* Used to control which entity types are checked and the behavior mode.
147147
*/
148+
/**
149+
* Default PII entities to check.
150+
*
151+
* **IMPORTANT:** NRP and PERSON are excluded from defaults due to high false positive rates.
152+
* These patterns match overly broad text patterns:
153+
* - NRP: Matches any two consecutive words (e.g., "nuevo cliente", "crea un")
154+
* - PERSON: Matches any two capitalized words (e.g., "New York", "The User")
155+
*
156+
* If you need to detect person names or national registration numbers, explicitly
157+
* include these entities in your configuration, or use more specific region-based
158+
* patterns like SG_NRIC_FIN, UK_NINO, etc.
159+
*/
160+
const DEFAULT_PII_ENTITIES: PIIEntity[] = Object.values(PIIEntity).filter(
161+
(entity) => entity !== PIIEntity.NRP && entity !== PIIEntity.PERSON
162+
);
163+
148164
export const PIIConfig = z.object({
149-
entities: z.array(z.nativeEnum(PIIEntity)).default(() => Object.values(PIIEntity)),
165+
entities: z.array(z.nativeEnum(PIIEntity)).default(() => DEFAULT_PII_ENTITIES),
150166
block: z
151167
.boolean()
152168
.default(false)
@@ -844,6 +860,52 @@ function _asResult(
844860
};
845861
}
846862

863+
/**
864+
* Deprecated PII entities that have high false positive rates.
865+
*/
866+
const DEPRECATED_ENTITIES = new Set([PIIEntity.NRP, PIIEntity.PERSON]);
867+
868+
/**
869+
* Track which deprecation warnings have been shown to avoid spam.
870+
*/
871+
const shownDeprecationWarnings = new Set<string>();
872+
873+
/**
874+
* Clear deprecation warning cache. FOR TESTING ONLY.
875+
* @internal
876+
*/
877+
export function _clearDeprecationWarnings(): void {
878+
shownDeprecationWarnings.clear();
879+
}
880+
881+
/**
882+
* Warn users about deprecated PII entities with high false positive rates.
883+
*
884+
* @param entities The list of entities being checked
885+
*/
886+
function _warnDeprecatedEntities(entities: PIIEntity[]): void {
887+
const deprecated = entities.filter((entity) => DEPRECATED_ENTITIES.has(entity));
888+
889+
for (const entity of deprecated) {
890+
if (shownDeprecationWarnings.has(entity)) {
891+
continue;
892+
}
893+
894+
shownDeprecationWarnings.add(entity);
895+
896+
const description =
897+
entity === PIIEntity.NRP
898+
? 'NRP matches any two consecutive words'
899+
: 'PERSON matches any two capitalized words';
900+
901+
console.warn(
902+
`[openai-guardrails-js] DEPRECATION: PIIEntity.${entity} removed from defaults (${description}).\n` +
903+
` A more robust implementation will be released in a future version.\n` +
904+
` To suppress: remove PIIEntity.${entity} from config. See: https://github.com/openai/openai-guardrails-js/issues/47`
905+
);
906+
}
907+
}
908+
847909
/**
848910
* Async guardrail check_fn for PII entity detection in text.
849911
*
@@ -861,6 +923,9 @@ export const pii: CheckFn<Record<string, unknown>, string, PIIConfig> = async (
861923
data,
862924
config
863925
): Promise<GuardrailResult> => {
926+
// Warn about deprecated entities
927+
_warnDeprecatedEntities(config.entities);
928+
864929
const result = _detectPii(data, config);
865930
return _asResult(result, config, 'Contains PII', data);
866931
};

0 commit comments

Comments
 (0)