Skip to content

PII Model Produces Insufficient Confidence Scores for Critical PII Types #647

@yossiovadia

Description

@yossiovadia

PII Model Produces Insufficient Confidence Scores for Critical PII Types

Summary

The PII classification model (pii_classifier_modernbert-base_presidio_token_model) produces confidence scores below the 0.7 threshold for critical PII types (SSN, Credit Card, Email), preventing policy enforcement from triggering.

Problem: Model Quality Issue

The PII model detects entities but assigns insufficient confidence scores to critical PII types:

PII Type Example Input Model Confidence Threshold Result
SSN My SSN is 123-45-6789 0.6478 0.7 ❌ Not enforced
Email [email protected] 0.6368 0.7 ❌ Not enforced
Email (dots) [email protected] 0.5742 0.7 ❌ Not enforced
Credit Card 4111-1111-1111-1111 0.3679 0.7 ❌ Not enforced
Credit Card (sentence) Card number 4111-1111-1111-1111 0.4742 0.7 ❌ Not enforced

Because scores fall below the threshold, entities are filtered out and policy enforcement never triggers.

What Works (Proof Model Quality is the Issue)

When the model produces high confidence, enforcement does work correctly:

PII Type Example Input Model Confidence Result
Person Names John Smith 0.9988 ✅ Policy enforced
Person (sentence) My name is John Smith 0.9995 ✅ Policy enforced
Email (prefixed) Email: [email protected] 0.8262 ✅ Policy enforced
Phone (sentence) Call me at 555-123-4567 0.9435 ✅ Policy enforced
Credit Card (no dashes) 4111111111111111 0.7549 ✅ Policy enforced

This proves the enforcement logic is working correctly - the issue is purely the model's inability to confidently detect critical PII patterns.

Additional Model Quality Issues

1. Misclassification

The model incorrectly classifies some PII types:

  • 4111-1111-1111-1111 → Detected as PHONE_NUMBER instead of CREDIT_CARD (conf: 0.37)
  • [email protected] → Sometimes detected as PERSON instead of EMAIL_ADDRESS
  • 4111 1111 1111 1111 → Detected as PHONE_NUMBER instead of CREDIT_CARD (conf: 0.73)

2. Format Sensitivity

Confidence varies dramatically based on formatting:

Credit Card Numbers:

  • 4111111111111111 (no separators): 0.75
  • 4111-1111-1111-1111 (dashes): 0.37
  • 4111 1111 1111 1111 (spaces): 0.73

This inconsistency means detection depends heavily on input format.

Reproduction

  1. Send request with SSN through semantic router:

    curl -X POST "http://localhost:8801/v1/chat/completions" \
      -H 'Content-Type: application/json' \
      -d '{"model": "auto", "messages": [{"role": "user", "content": "My SSN is 123-45-6789"}]}'
  2. Expected: Request blocked or routed to PII-compliant model

  3. Actual: Request processed normally (no policy enforcement)

Current Configuration

classifier:
  pii_model:
    model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
    use_modernbert: true
    threshold: 0.7
    use_cpu: true

Logs Evidence

Low confidence - no enforcement:

{"msg":"PII token classification found 1 entities"}
[Entity filtered out - no further logs, no enforcement]

High confidence - enforcement works:

{"msg":"PII token classification found 1 entities"}
{"msg":"Detected PII entity: B-PERSON ('John Smith') with confidence 0.999"}
{"msg":"Detected PII types: [B-PERSON]"}
{"msg":"PII policy violation for model Model-A: denied PII types [B-PERSON]"}
{"msg":"Selected alternative model Model-B that passes PII policy"}

Related Issues

Closes #336 - which incorrectly described this as an enforcement code bug. Investigation revealed the enforcement logic works correctly; the issue is model quality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions