PII Model Produces Insufficient Confidence Scores for Critical PII Types

# PII Model Produces Insufficient Confidence Scores for Critical PII Types

## Summary

The PII classification model (`pii_classifier_modernbert-base_presidio_token_model`) produces confidence scores below the 0.7 threshold for critical PII types (SSN, Credit Card, Email), preventing policy enforcement from triggering.

## Problem: Model Quality Issue

The PII model detects entities but assigns **insufficient confidence scores** to critical PII types:

| PII Type | Example Input | Model Confidence | Threshold | Result |
|----------|---------------|------------------|-----------|---------|
| **SSN** | `My SSN is 123-45-6789` | **0.6478** | 0.7 | ❌ Not enforced |
| **Email** | `john@example.com` | **0.6368** | 0.7 | ❌ Not enforced |
| **Email (dots)** | `john.doe@example.com` | **0.5742** | 0.7 | ❌ Not enforced |
| **Credit Card** | `4111-1111-1111-1111` | **0.3679** | 0.7 | ❌ Not enforced |
| **Credit Card (sentence)** | `Card number 4111-1111-1111-1111` | **0.4742** | 0.7 | ❌ Not enforced |

Because scores fall below the threshold, entities are filtered out and policy enforcement never triggers.

## What Works (Proof Model Quality is the Issue)

When the model produces high confidence, enforcement **does** work correctly:

| PII Type | Example Input | Model Confidence | Result |
|----------|---------------|------------------|---------|
| **Person Names** | `John Smith` | **0.9988** | ✅ Policy enforced |
| **Person (sentence)** | `My name is John Smith` | **0.9995** | ✅ Policy enforced |
| **Email (prefixed)** | `Email: john@example.com` | **0.8262** | ✅ Policy enforced |
| **Phone (sentence)** | `Call me at 555-123-4567` | **0.9435** | ✅ Policy enforced |
| **Credit Card (no dashes)** | `4111111111111111` | **0.7549** | ✅ Policy enforced |

**This proves the enforcement logic is working correctly** - the issue is purely the model's inability to confidently detect critical PII patterns.

## Additional Model Quality Issues

### 1. Misclassification

The model incorrectly classifies some PII types:

- `4111-1111-1111-1111` → Detected as `PHONE_NUMBER` instead of `CREDIT_CARD` (conf: 0.37)
- `john@example.com` → Sometimes detected as `PERSON` instead of `EMAIL_ADDRESS`
- `4111 1111 1111 1111` → Detected as `PHONE_NUMBER` instead of `CREDIT_CARD` (conf: 0.73)

### 2. Format Sensitivity

Confidence varies dramatically based on formatting:

**Credit Card Numbers:**
- `4111111111111111` (no separators): **0.75** ✅
- `4111-1111-1111-1111` (dashes): **0.37** ❌
- `4111 1111 1111 1111` (spaces): **0.73** ✅

This inconsistency means detection depends heavily on input format.

## Reproduction

1. Send request with SSN through semantic router:
   ```bash
   curl -X POST "http://localhost:8801/v1/chat/completions" \
     -H 'Content-Type: application/json' \
     -d '{"model": "auto", "messages": [{"role": "user", "content": "My SSN is 123-45-6789"}]}'
   ```

2. **Expected**: Request blocked or routed to PII-compliant model
3. **Actual**: Request processed normally (no policy enforcement)

## Current Configuration

```yaml
classifier:
  pii_model:
    model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
    use_modernbert: true
    threshold: 0.7
    use_cpu: true
```

## Logs Evidence

**Low confidence - no enforcement:**
```
{"msg":"PII token classification found 1 entities"}
[Entity filtered out - no further logs, no enforcement]
```

**High confidence - enforcement works:**
```
{"msg":"PII token classification found 1 entities"}
{"msg":"Detected PII entity: B-PERSON ('John Smith') with confidence 0.999"}
{"msg":"Detected PII types: [B-PERSON]"}
{"msg":"PII policy violation for model Model-A: denied PII types [B-PERSON]"}
{"msg":"Selected alternative model Model-B that passes PII policy"}
```

## Related Issues

Closes #336 - which incorrectly described this as an enforcement code bug. Investigation revealed the enforcement logic works correctly; the issue is model quality.


PII Type	Example Input	Model Confidence	Result
Person Names	`John Smith`	0.9988	✅ Policy enforced
Person (sentence)	`My name is John Smith`	0.9995	✅ Policy enforced
Email (prefixed)	`Email: [email protected]`	0.8262	✅ Policy enforced
Phone (sentence)	`Call me at 555-123-4567`	0.9435	✅ Policy enforced
Credit Card (no dashes)	`4111111111111111`	0.7549	✅ Policy enforced

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PII Model Produces Insufficient Confidence Scores for Critical PII Types #647

PII Model Produces Insufficient Confidence Scores for Critical PII Types

Summary

Problem: Model Quality Issue

What Works (Proof Model Quality is the Issue)

Additional Model Quality Issues

1. Misclassification

2. Format Sensitivity

Reproduction

Current Configuration

Logs Evidence

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PII Type	Example Input	Model Confidence	Threshold	Result
SSN	`My SSN is 123-45-6789`	0.6478	0.7	❌ Not enforced
Email	`[email protected]`	0.6368	0.7	❌ Not enforced
Email (dots)	`[email protected]`	0.5742	0.7	❌ Not enforced
Credit Card	`4111-1111-1111-1111`	0.3679	0.7	❌ Not enforced
Credit Card (sentence)	`Card number 4111-1111-1111-1111`	0.4742	0.7	❌ Not enforced

PII Model Produces Insufficient Confidence Scores for Critical PII Types #647

Description

PII Model Produces Insufficient Confidence Scores for Critical PII Types

Summary

Problem: Model Quality Issue

What Works (Proof Model Quality is the Issue)

Additional Model Quality Issues

1. Misclassification

2. Format Sensitivity

Reproduction

Current Configuration

Logs Evidence

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions