Skip to content

Performance Comparison - Presidio (direct) vs Detect PII (Guardrails) #12

@RohitPShetty

Description

@RohitPShetty

Hi,

I was comparing the performance of using Presidio directly vs using the Detect PII validator via Guardrails. In most cases, I found that there is difference of 1/10th of a second with using Presidio directly performing better than Detect PII. Both used the default model (en_web_core_lg) and on the same dataset. Wanted to understand if this is due to the additional Guardrails wrappers or am I missing something.

Example dataset: https://github.com/microsoft/presidio-research/blob/master/data/synth_dataset_v2.json

PII entities:

    "pii": [
        "EMAIL_ADDRESS",
        "PHONE_NUMBER",
        "IP_ADDRESS",
        "DATE_TIME",
        "LOCATION",
        "PERSON",
        "URL",
        "NRP",
        "CREDIT_CARD",
        "US_BANK_NUMBER",
        "US_DRIVER_LICENSE"
    ]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions