Skip to content

Add pii_detection check #2374

@linear

Description

@linear

Summary

Add a built-in check that detects personally identifiable information (PII) in model outputs. Could be a hybrid approach combining regex patterns (for structured PII like emails, phone numbers, SSNs) with LLM judgment (for contextual PII like names, addresses).

Motivation

PII leakage is a critical safety concern. promptfoo ships pii detection and Patronus AI has PHI (Protected Health Information) detection. Important for compliance (GDPR, HIPAA, CCPA).

Implementation Guide

Approach: Hybrid (pattern + LLM)

  1. Pattern-based layer (fast, deterministic):
    • Email addresses, phone numbers, SSNs, credit card numbers, IP addresses
    • Configurable pattern set
  2. LLM-based layer (contextual):
    • Detect names, addresses, medical information, financial details in context

Steps

  1. Create check: src/giskard/checks/judges/pii_detection.py
    • Register as "pii_detection"
    • Support:
      • key: JSONPathStr — output to analyze
      • categories: list[str] | None = None — PII categories to detect (email, phone, ssn, name, address, etc.)
      • mode: Literal["pattern", "llm", "hybrid"] = "hybrid"
  2. Create template for LLM layer: src/giskard/checks/prompts/judges/pii_detection.j2
  3. Add tests

Example usage

from giskard.checks import PIIDetection, Scenario

scenario = (
    Scenario(name="no_pii_leakage")
    .interact(inputs="Tell me about the user", outputs="The user has been a member since 2020.")
    .check(PIIDetection())
)

Acceptance Criteria

  • Detects structured PII via regex (emails, phone numbers, SSNs)
  • Detects contextual PII via LLM (names, addresses)
  • Configurable category filtering
  • Hybrid mode combines both approaches
  • Tests cover: clean output passes, PII present fails, category filtering

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions