The Azure Health Data Services (AHDS) de-identification Service is a cloud-based service that provides advanced natural language processing over raw text. One of its main functions includes Named Entity Recognition (NER), which has the ability to identify different entities in text and categorize them into pre-defined classes or types. This document will demonstrate Presidio integration with the AHDS De-Identification Service.
Azure Health Data Services de-identification supports multiple PII entity categories. The Azure Health Data Services de-identification service runs a predictive model to identify and categorize named entities from an input document. The service's latest version includes the ability to detect personal (PII) and health (PHI) information. A list of all supported entities can be found in the official documentation.
To use AHDS De-Identification with Presidio, an Azure De-Identification Service resource should first be created under an Azure subscription. Follow the official documentation for instructions. The endpoint, generated once the resource is created, will be used when integrating with AHDS De-Identification, using a Presidio remote recognizer.
The integration uses a secure-by-default authentication approach:
Production Mode (Default): Uses a restricted credential chain (EnvironmentCredential, WorkloadIdentityCredential, ManagedIdentityCredential)
Development Mode: Set ENV=development to use DefaultAzureCredential for local development with Azure CLI:
export ENV=development
az loginFor more details, see the AHDS Integration Authentication documentation.
The implementation of a AzureHealthDeid recognizer can be found here.
- Install the package with the ahds extra:
pip install "presidio-analyzer[ahds]"-
Define environment variables
AHDS_ENDPOINT -
Add the
AzureHealthDeidRecognizerto the recognizer registry:
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.predefined_recognizers import AzureHealthDeidRecognizer
ahds = AzureHealthDeidRecognizer()
analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(ahds)
analyzer.analyze(text="My email is email@email.com", language="en")See also: For a full surrogate integration example, see example_ahds_surrogate.py