Skip to content

Commit 2e7ba89

Browse files
committed
Update PII to handle encoded content
1 parent fb54217 commit 2e7ba89

File tree

4 files changed

+892
-106
lines changed

4 files changed

+892
-106
lines changed

docs/ref/checks/pii.md

Lines changed: 45 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,37 @@
11
# Contains PII
22

3-
Detects personally identifiable information (PII) such as SSNs, phone numbers, credit card numbers, and email addresses using Microsoft's [Presidio library](https://microsoft.github.io/presidio/). Will automatically mask detected PII or block content based on configuration.
3+
Detects personally identifiable information (PII) such as SSNs, phone numbers, credit card numbers, and email addresses using Guardrails' built-in TypeScript regex engine. The check can automatically mask detected spans or block the request based on configuration.
4+
5+
**Advanced Security Features:**
6+
7+
- **Unicode normalization**: Prevents bypasses using fullwidth characters (@) or zero-width spaces
8+
- **Encoded PII detection**: Optionally detects PII hidden in Base64, URL-encoded, or hex strings
9+
- **URL context awareness**: Detects emails in query parameters (e.g., `GET /[email protected]`)
10+
- **Custom patterns**: Extends the default entity list with CVV/CVC codes, BIC/SWIFT identifiers, and other global formats
411

512
## Configuration
613

714
```json
815
{
916
"name": "Contains PII",
1017
"config": {
11-
"entities": ["EMAIL_ADDRESS", "US_SSN", "CREDIT_CARD", "PHONE_NUMBER"],
12-
"block": false
18+
"entities": ["EMAIL_ADDRESS", "US_SSN", "CREDIT_CARD", "PHONE_NUMBER", "CVV", "BIC_SWIFT"],
19+
"block": false,
20+
"detect_encoded_pii": false
1321
}
1422
}
1523
```
1624

1725
### Parameters
1826

19-
- **`entities`** (required): List of PII entity types to detect. See the full list of [supported entities](https://microsoft.github.io/presidio/supported_entities/).
27+
- **`entities`** (required): List of PII entity types to detect. See the `PIIEntity` enum in `src/checks/pii.ts` for the full list, including custom entities such as `CVV` (credit card security codes) and `BIC_SWIFT` (bank identification codes).
2028
- **`block`** (optional): Whether to block content or just mask PII (default: `false`)
29+
- **`detect_encoded_pii`** (optional): If `true`, detects PII in Base64/URL-encoded/hex strings (default: `false`)
2130

2231
## Implementation Notes
2332

33+
Under the hood the TypeScript guardrail normalizes text (Unicode NFKC), strips zero-width characters, and runs curated regex patterns for each configured entity. When `detect_encoded_pii` is enabled the check also decodes Base64, URL-encoded, and hexadecimal substrings before rescanning them for matches, remapping any findings back to the original encoded content.
34+
2435
**Stage-specific behavior is critical:**
2536

2637
- **Pre-flight stage**: Use `block=false` (default) for automatic PII masking of user input
@@ -30,7 +41,7 @@ Detects personally identifiable information (PII) such as SSNs, phone numbers, c
3041
**PII masking mode** (default, `block=false`):
3142

3243
- Automatically replaces detected PII with placeholder tokens like `<EMAIL_ADDRESS>`, `<US_SSN>`
33-
- Does not trigger tripwire - allows content through with PII removed
44+
- Does not trigger tripwire - allows content through with PII masked
3445

3546
**Blocking mode** (`block=true`):
3647

@@ -41,6 +52,8 @@ Detects personally identifiable information (PII) such as SSNs, phone numbers, c
4152

4253
Returns a `GuardrailResult` with the following `info` dictionary:
4354

55+
### Basic Example (Plain PII)
56+
4457
```json
4558
{
4659
"guardrail_name": "Contains PII",
@@ -49,14 +62,37 @@ Returns a `GuardrailResult` with the following `info` dictionary:
4962
"US_SSN": ["123-45-6789"]
5063
},
5164
"entity_types_checked": ["EMAIL_ADDRESS", "US_SSN", "CREDIT_CARD"],
52-
"checked_text": "Contact me at <EMAIL_ADDRESS>, SSN: <US_SSN>",
5365
"block_mode": false,
5466
"pii_detected": true
5567
}
5668
```
5769

58-
- **`detected_entities`**: Detected entities and their values
70+
### With Encoded PII Detection Enabled
71+
72+
When `detect_encoded_pii: true`, the guardrail also detects and masks encoded PII:
73+
74+
```json
75+
{
76+
"guardrail_name": "Contains PII",
77+
"detected_entities": {
78+
"EMAIL_ADDRESS": [
79+
80+
"am9obkBleGFtcGxlLmNvbQ==",
81+
"%6a%6f%65%40domain.com",
82+
"6a6f686e406578616d706c652e636f6d"
83+
]
84+
},
85+
"entity_types_checked": ["EMAIL_ADDRESS"],
86+
"block_mode": false,
87+
"pii_detected": true
88+
}
89+
```
90+
91+
Note: Encoded PII is masked with `<ENTITY_TYPE_ENCODED>` to distinguish it from plain text PII.
92+
93+
### Field Descriptions
94+
95+
- **`detected_entities`**: Detected entities and their values (includes both plain and encoded forms when `detect_encoded_pii` is enabled)
5996
- **`entity_types_checked`**: List of entity types that were configured for detection
60-
- **`checked_text`**: Text with PII masked (if PII was found) or original text (if no PII was found)
6197
- **`block_mode`**: Whether the check was configured to block or mask
62-
- **`pii_detected`**: Boolean indicating if any PII was found
98+
- **`pii_detected`**: Boolean indicating if any PII was found (plain or encoded)

examples/basic/pii_mask_example.ts

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
#!/usr/bin/env node
2+
/**
3+
* PII Masking Example: Interactive chat with GuardrailsOpenAI.
4+
*
5+
* Demonstrates how to mask PII in the pre-flight stage (block=false) so that
6+
* user inputs are sanitized before reaching the model, while also blocking
7+
* PII that appears in the model's output (block=true).
8+
*
9+
* Highlights:
10+
* - Pre-flight PII guardrail automatically replaces detected entities with tokens like <EMAIL_ADDRESS>
11+
* - Encoded PII detection (Base64/URL/hex) is enabled via detect_encoded_pii
12+
* - Output stage blocks responses when PII is detected in the model reply
13+
* - Console output shows what was masked and which entities were found
14+
*
15+
* Run with: npx tsx pii_mask_example.ts
16+
*
17+
* Prerequisites:
18+
* - Set OPENAI_API_KEY in your environment
19+
*/
20+
21+
import * as readline from 'readline';
22+
import {
23+
GuardrailResult,
24+
GuardrailTripwireTriggered,
25+
GuardrailsOpenAI,
26+
GuardrailsResponse,
27+
} from '../../src';
28+
29+
type ChatMessage = { role: 'system' | 'user' | 'assistant'; content: string };
30+
31+
const PIPELINE_CONFIG = {
32+
version: 1,
33+
pre_flight: {
34+
version: 1,
35+
guardrails: [
36+
{
37+
name: 'Contains PII',
38+
config: {
39+
entities: ['EMAIL_ADDRESS', 'PHONE_NUMBER', 'US_SSN'],
40+
block: false,
41+
detect_encoded_pii: true,
42+
},
43+
},
44+
],
45+
},
46+
input: {
47+
version: 1,
48+
guardrails: [
49+
{
50+
name: 'Moderation',
51+
config: {
52+
categories: ['hate', 'violence'],
53+
},
54+
},
55+
],
56+
},
57+
output: {
58+
version: 1,
59+
guardrails: [
60+
{
61+
name: 'Contains PII',
62+
config: {
63+
entities: ['EMAIL_ADDRESS', 'PHONE_NUMBER', 'US_SSN'],
64+
block: true,
65+
detect_encoded_pii: true,
66+
},
67+
},
68+
],
69+
},
70+
};
71+
72+
function createInterface(): readline.Interface {
73+
return readline.createInterface({
74+
input: process.stdin,
75+
output: process.stdout,
76+
prompt: '\nEnter a message (or type "exit"): ',
77+
});
78+
}
79+
80+
function formatEntitySummary(entities: Record<string, string[]> | undefined): string {
81+
if (!entities) {
82+
return 'None';
83+
}
84+
const parts: string[] = [];
85+
for (const [entity, matches] of Object.entries(entities)) {
86+
parts.push(`${entity} (${matches.length})`);
87+
}
88+
return parts.length ? parts.join(', ') : 'None';
89+
}
90+
91+
function logPiiMasking(result: GuardrailResult, originalInput: string): void {
92+
const info = result.info ?? {};
93+
const masked = typeof info.checked_text === 'string' ? info.checked_text : originalInput;
94+
const detected = info.detected_entities as Record<string, string[]> | undefined;
95+
const stage = info.stage_name ?? 'pre_flight';
96+
97+
console.log(`\n🪪 PII detected and masked (${stage} stage)`);
98+
console.log('Original :', originalInput);
99+
console.log('Sanitized:', masked);
100+
console.log('Entities :', formatEntitySummary(detected));
101+
}
102+
103+
function logPiiInOutput(result: GuardrailResult): void {
104+
const info = result.info ?? {};
105+
const detected = info.detected_entities as Record<string, string[]> | undefined;
106+
const stage = info.stage_name ?? 'output';
107+
console.log(`\n⚠️ PII detected – response blocked (${stage} stage).`);
108+
console.log('Entities :', formatEntitySummary(detected));
109+
}
110+
111+
function inspectGuardrailResults(
112+
response: GuardrailsResponse,
113+
originalInput: string
114+
): void {
115+
const results = response.guardrail_results;
116+
117+
if (results.preflight.length > 0) {
118+
for (const result of results.preflight) {
119+
const info = result.info ?? {};
120+
if (info.guardrail_name === 'Contains PII' && info.pii_detected) {
121+
logPiiMasking(result, originalInput);
122+
}
123+
}
124+
}
125+
126+
if (results.output.length > 0) {
127+
for (const result of results.output) {
128+
const info = result.info ?? {};
129+
if (info.guardrail_name === 'Contains PII' && result.tripwireTriggered) {
130+
logPiiInOutput(result);
131+
}
132+
}
133+
}
134+
}
135+
136+
async function processInput(
137+
client: GuardrailsOpenAI,
138+
userInput: string,
139+
conversation: ChatMessage[]
140+
): Promise<void> {
141+
const messages = [...conversation, { role: 'user' as const, content: userInput }];
142+
143+
const response = await client.chat.completions.create({
144+
model: 'gpt-4.1-mini',
145+
messages,
146+
});
147+
148+
inspectGuardrailResults(response, userInput);
149+
150+
const assistantMessage = response.choices[0]?.message?.content ?? '';
151+
console.log('\n🤖 Assistant:', assistantMessage.trim());
152+
153+
conversation.push({ role: 'user', content: userInput });
154+
conversation.push({ role: 'assistant', content: assistantMessage });
155+
}
156+
157+
async function main(): Promise<void> {
158+
console.log('🔐 Guardrails PII Masking Example');
159+
console.log(' - Pre-flight guardrail masks PII before it hits the model');
160+
console.log(' - Output guardrail blocks replies that contain PII');
161+
162+
const client = await GuardrailsOpenAI.create(PIPELINE_CONFIG);
163+
const conversation: ChatMessage[] = [
164+
{
165+
role: 'system',
166+
content: 'You are a helpful assistant. Keep responses concise.',
167+
},
168+
];
169+
170+
const rl = createInterface();
171+
rl.prompt();
172+
173+
rl.on('line', async (line) => {
174+
const input = line.trim();
175+
176+
if (!input) {
177+
rl.prompt();
178+
return;
179+
}
180+
181+
if (input.toLowerCase() === 'exit') {
182+
rl.close();
183+
return;
184+
}
185+
186+
try {
187+
await processInput(client, input, conversation);
188+
} catch (error) {
189+
if (error instanceof GuardrailTripwireTriggered) {
190+
const info = error.guardrailResult.info ?? {};
191+
const stage = info.stage_name ?? 'unknown';
192+
console.log(
193+
`\n🛑 Guardrail triggered in ${stage} stage: ${info.guardrail_name ?? 'Unknown guardrail'}`
194+
);
195+
console.log(JSON.stringify(error.guardrailResult, null, 2));
196+
} else {
197+
console.error('\n❌ Error processing request:', error instanceof Error ? error.message : error);
198+
}
199+
}
200+
201+
rl.prompt();
202+
});
203+
204+
rl.on('close', () => {
205+
console.log('\n👋 Exiting the program.');
206+
process.exit(0);
207+
});
208+
}
209+
210+
main().catch((error) => {
211+
console.error('Fatal error:', error);
212+
process.exit(1);
213+
});

0 commit comments

Comments
 (0)