-
Notifications
You must be signed in to change notification settings - Fork 291
Description
PII Model Produces Insufficient Confidence Scores for Critical PII Types
Summary
The PII classification model (pii_classifier_modernbert-base_presidio_token_model) produces confidence scores below the 0.7 threshold for critical PII types (SSN, Credit Card, Email), preventing policy enforcement from triggering.
Problem: Model Quality Issue
The PII model detects entities but assigns insufficient confidence scores to critical PII types:
| PII Type | Example Input | Model Confidence | Threshold | Result |
|---|---|---|---|---|
| SSN | My SSN is 123-45-6789 |
0.6478 | 0.7 | ❌ Not enforced |
[email protected] |
0.6368 | 0.7 | ❌ Not enforced | |
| Email (dots) | [email protected] |
0.5742 | 0.7 | ❌ Not enforced |
| Credit Card | 4111-1111-1111-1111 |
0.3679 | 0.7 | ❌ Not enforced |
| Credit Card (sentence) | Card number 4111-1111-1111-1111 |
0.4742 | 0.7 | ❌ Not enforced |
Because scores fall below the threshold, entities are filtered out and policy enforcement never triggers.
What Works (Proof Model Quality is the Issue)
When the model produces high confidence, enforcement does work correctly:
| PII Type | Example Input | Model Confidence | Result |
|---|---|---|---|
| Person Names | John Smith |
0.9988 | ✅ Policy enforced |
| Person (sentence) | My name is John Smith |
0.9995 | ✅ Policy enforced |
| Email (prefixed) | Email: [email protected] |
0.8262 | ✅ Policy enforced |
| Phone (sentence) | Call me at 555-123-4567 |
0.9435 | ✅ Policy enforced |
| Credit Card (no dashes) | 4111111111111111 |
0.7549 | ✅ Policy enforced |
This proves the enforcement logic is working correctly - the issue is purely the model's inability to confidently detect critical PII patterns.
Additional Model Quality Issues
1. Misclassification
The model incorrectly classifies some PII types:
4111-1111-1111-1111→ Detected asPHONE_NUMBERinstead ofCREDIT_CARD(conf: 0.37)[email protected]→ Sometimes detected asPERSONinstead ofEMAIL_ADDRESS4111 1111 1111 1111→ Detected asPHONE_NUMBERinstead ofCREDIT_CARD(conf: 0.73)
2. Format Sensitivity
Confidence varies dramatically based on formatting:
Credit Card Numbers:
4111111111111111(no separators): 0.75 ✅4111-1111-1111-1111(dashes): 0.37 ❌4111 1111 1111 1111(spaces): 0.73 ✅
This inconsistency means detection depends heavily on input format.
Reproduction
-
Send request with SSN through semantic router:
curl -X POST "http://localhost:8801/v1/chat/completions" \ -H 'Content-Type: application/json' \ -d '{"model": "auto", "messages": [{"role": "user", "content": "My SSN is 123-45-6789"}]}'
-
Expected: Request blocked or routed to PII-compliant model
-
Actual: Request processed normally (no policy enforcement)
Current Configuration
classifier:
pii_model:
model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
use_modernbert: true
threshold: 0.7
use_cpu: trueLogs Evidence
Low confidence - no enforcement:
{"msg":"PII token classification found 1 entities"}
[Entity filtered out - no further logs, no enforcement]
High confidence - enforcement works:
{"msg":"PII token classification found 1 entities"}
{"msg":"Detected PII entity: B-PERSON ('John Smith') with confidence 0.999"}
{"msg":"Detected PII types: [B-PERSON]"}
{"msg":"PII policy violation for model Model-A: denied PII types [B-PERSON]"}
{"msg":"Selected alternative model Model-B that passes PII policy"}
Related Issues
Closes #336 - which incorrectly described this as an enforcement code bug. Investigation revealed the enforcement logic works correctly; the issue is model quality.