Skip to content

Commit 10f2a66

Browse files
yossiovadiaclaude
andcommitted
test(pii): add comprehensive PII detection test suite and update e2e config
Add two comprehensive PII testing tools and update e2e configuration to use LoRA PII model instead of broken ModernBERT model. Changes: 1. Add 06-a-test-pii-direct.py - 37 comprehensive PII test cases - Tests email, SSN, credit card, phone, person names, addresses, etc. - Validates confidence scores and entity type accuracy - Compares ModernBERT vs LoRA performance 2. Add pii-confidence-benchmark.py - 84-prompt benchmark tool - Tests diverse PII patterns and formats - Outputs detailed statistics (precision, recall, F1 score) - Generates JSON results for analysis - Measures processing time and confidence distribution 3. Update config/testing/config.e2e.yaml - Change model_id to lora_pii_detector_bert-base-uncased_model - Update pii_mapping_path to match LoRA model structure - Required because ModernBERT model is incompatible with auto-detection code Note: The old ModernBERT PII model lacks the hidden_act field required by Traditional BERT classifier, causing fatal initialization errors. Test Results with LoRA model: - Overall: 88% accuracy (74/84 prompts) - Precision: 95.5% (when detected, almost always correct) - Recall: 90.0% (detects 90% of actual PII) - F1 Score: 0.926 - All confidence scores: 0.9 (uniform, see caveat in vllm-project#647) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
1 parent efe47a6 commit 10f2a66

File tree

3 files changed

+828
-3
lines changed

3 files changed

+828
-3
lines changed

config/testing/config.e2e.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,11 +76,11 @@ classifier:
7676
use_cpu: true
7777
category_mapping_path: "models/lora_intent_classifier_bert-base-uncased_model/category_mapping.json"
7878
pii_model:
79-
model_id: "models/pii_classifier_modernbert-base_presidio_token_model" # TODO: Use local model for now before the code can download the entire model from huggingface
80-
use_modernbert: true
79+
model_id: "models/lora_pii_detector_bert-base-uncased_model"
80+
use_modernbert: false # BERT-based LoRA model (this field is ignored - always auto-detects)
8181
threshold: 0.7
8282
use_cpu: true
83-
pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"
83+
pii_mapping_path: "models/lora_pii_detector_bert-base-uncased_model/pii_type_mapping.json"
8484
categories:
8585
- name: business
8686
model_scores:

0 commit comments

Comments
 (0)