Skip to content

Commit 5bca8b6

Browse files
authored
Merge pull request #300303 from LeaKass/patch-1
Update overview.md
2 parents 30ddd10 + ac9e0a4 commit 5bca8b6

File tree

4 files changed

+56
-14
lines changed

4 files changed

+56
-14
lines changed
429 KB
Loading

articles/healthcare-apis/deidentification/overview.md

Lines changed: 56 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Overview of the De-identification service in Azure Health Data Services
3-
description: Learn how the De-identification service in Azure Health Data Services de-identifies clinical data for HIPAA compliance while retaining data relevance for research and analytics.
4-
author: kimiamavon
3+
description: Learn how the De-identification service in Azure Health Data Services de-identifies clinical data, adhering to HIPAA compliance while retaining data relevance for research and analytics.
4+
author: kimiamavon-msft
55
ms.service: azure-health-data-services
66
ms.subservice: deidentification-service
77
ms.topic: overview
@@ -11,28 +11,66 @@ ms.author: kimiamavon
1111

1212
# What is the de-identification service?
1313

14-
The de-identification service in Azure Health Data Services enables healthcare organizations de-identify clinical data so that the resulting data retains its clinical relevance and distribution while also adhering to the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule. The service uses state-of-the-art machine learning models to automatically extract, redact, or surrogate 27 entities - including the HIPAA 18 Protected Health Information (PHI) identifiers – from unstructured text such as clinical notes, transcripts, messages, or clinical trial studies.
14+
![Tag Redact and Surrogation operations.](tag-redact-surrogate-operations.png)
1515

16-
## Use de-identified data in research, analytics, and machine learning
1716

18-
The de-identification service unlocks data that was previously difficult to de-identify so organizations can conduct research and derive insights from analytics. The de-identification service supports three operations: **tag**, **redact**, or **surrogate PHI**. The de-identification service offers many benefits, including:
17+
The de-identification service in Azure Health Data Services enables healthcare organizations to de-identify clinical data so that the resulting data retains its clinical relevance and distribution while also adhering to the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule. The service uses state-of-the-art machine learning models to automatically extract, redact, or surrogate 27 entities - including the HIPAA 18 Protected Health Information (PHI) identifiers – from unstructured text such as clinical notes, transcripts, messages, or clinical trial studies.
1918

20-
- **Surrogation**: Surrogation, or replacement, is a best practice for PHI protection. The service can replace PHI elements with plausible replacement values, resulting in data that is most representative of the source data. Surrogation strengthens privacy protections as any false-negative PHI values are hidden within a document.
19+
## How do you benefit from de-identifying your data? 
2120

22-
- **Consistent replacement**: Consistent surrogation results enable organizations to retain relationships occurring in the underlying dataset, which is critical for research, analytics, and machine learning. By submitting data in the same batch, our service allows for consistent replacement across entities and preserves the relative temporal relationships between events.
21+
| As a | AHDS De-identification enables you to |
22+
|-------------------------|----------------------------------------------------------------------------------------------------------|
23+
| Data Scientist | Use de-identified data to train robust machine learning models, build conversational agents, and conduct longitudinal studies. |
24+
| Data Analyst | Monitor trends, build dashboards, and analyze outcomes without compromising privacy. |
25+
| Data Engineer | Build and test dev environments using realistic, non-identifiable data for safer deployment. |
26+
| Customer Service Agent | Summarize support conversations and extract insights while maintaining patient confidentiality. |
27+
| Executive Leader (C-Suite) | Reduce risks of data exposure, enable secure data sharing, drive AI adoption responsibly, and ensure regulatory compliance. |
2328

24-
- **Expanded PHI coverage**: The service expands beyond the 18 HIPAA Identifiers to provide stronger privacy protections and more fine-grained distinctions between entity types, such as distinguishing between Doctor and Patient.
29+
## Why is this service the right fit for your use case?
30+
31+
The de-identification service unlocks the power of your data by automating three operations:
32+
33+
- **TAG** identifies and Tags PHI in your clinical text, specifying the entity types (i.e. Patient Name, Doctor Name, Age, etc.)
34+
- **REDACT** replaces the identified PHI in your clinical text with the entity types
35+
- **SURROGATE** replaces the identified PHI in your clinical text with realistic pseudonyms (names of people, organizations, hospitals) and randomizes number based PHI (dates and alphanumeric entities such as ID Numbers and more)
36+
37+
> [!TIP]
38+
> **Surrogation**, or synthetic replacement, is a best practice for PHI protection. The service can replace PHI elements with plausible replacement values, resulting in data that is most representative of the source data. Surrogation strengthens privacy protections as any false-negative PHI values are hidden within a document.
39+
40+
### **Consistent replacement to preserve patient timelines**
41+
Consistent surrogation results enable organizations to retain relationships occurring in the underlying dataset, which is critical for research, analytics, and machine learning. By submitting data in the same batch, our service allows for consistent replacement across entities and preserves the relative temporal relationships between events.
42+
43+
![Screenshot of consistent surrogation.](consistent-surrogation.png)
2544

2645
## De-identify clinical data securely and efficiently
2746

2847
The de-identification service offers many benefits, including:
2948

30-
- **PHI compliance**: The de-identification service is designed for protected health information (PHI). The service uses machine learning to identify PHI entities, including HIPAA’s 18 identifiers, using the “TAG” operation. The redaction and surrogation operations replace these identified PHI values with a tag of the entity type or a surrogate, or pseudonym. The service also meets all regional compliance requirements including HIPAA, GDPR, and the California Consumer Privacy Act (CCPA).
49+
- **Expanded PHI coverage:**
50+
The service expands beyond the 18 HIPAA Identifiers to provide stronger privacy protections and more fine-grained distinctions between entity types. It distinguishes between Doctor and Patient, and covers [27 PHI entities the service de-identifies](https://learn.microsoft.com/rest/api/health-dataplane/deidentify-text/deidentify-text?view=rest-health-dataplane-2024-11-15&tabs=HTTP&source=docs#phicategory).
51+
52+
- **PHI compliance**: The de-identification service is designed for protected health information (PHI). The service uses machine learning to identify PHI entities, including HIPAA’s 18 identifiers, using the “TAG” operation. The redaction and surrogation operations replace these identified PHI values with a tag of the entity type or a surrogate, or pseudonym. The service adheres to compliance requirements such as HIPAA.
3153

3254
- **Security**: The de-identification service is a stateless service. Customer data stays within the customer’s tenant.
3355

3456
- **Role-based Access Control (RBAC)**: Azure role-based access control (RBAC) enables you to manage how your organization's data is processed, stored, and accessed. You determine who has access to de-identify datasets based on roles you define for your environment.
3557

58+
## Easy API Integration Into Your Workflow
59+
60+
![API Integration Workflow](workflow.png)
61+
62+
Integrating Azure’s de-identification service into your environment is fast, flexible, and secure — built from the ground up to support health and life sciences workflows with minimal effort.
63+
64+
- **API-First Design:** Whether you need real-time de-identification or asynchronous batch processing from Azure Blob Storage, our REST API and SDKs provide easy integration points to fit your system.
65+
66+
- **Quick Setup:** Deploy the service in minutes using Azure portal, ARM templates, Bicep, or CLI. You can be up and running quickly without complex configuration.
67+
68+
- **Secure Access:** Enable private endpoints using Azure Private Link to keep data traffic off the public internet.
69+
70+
- **Fully Managed Identity Support:** Use managed identities for secure, credential-free access to Azure Blob Storage.
71+
72+
- **Compliance-Ready:** The service operates within your Azure tenant and adheres with HIPAA.
73+
3674
## Synchronous or asynchronous endpoints
3775

3876
The de-identification service offers two ways to interact with the REST API or Client library (Azure SDK).
@@ -50,13 +88,17 @@ The following service limits are applicable:
5088
- Each document processed by a job can't exceed 2 MB.
5189

5290
## Pricing
53-
As with other Azure Health Data Services, you pay only for what you use. You have a monthly allotment that enables you to try the product for free.
5491

55-
| Transformation Operation (per MB) | Up to 50 MB | Over 50 MB |
56-
| ---------------- | ------ | ---- |
57-
| Unstructured text de-identification | $0 | $0.05 |
92+
The de-identification service pricing is dependent on the amount of data de-identified by our service.
93+
You are charged per MB, for any of the three operations we offer, whether you are using the asynchronous or synchronous endpoint.
94+
95+
The cost per MB de-identified is displayed in the row "Unstructured De-identification" in the table "Transformation Operations" in the [Azure Pricing Page](https://azure.microsoft.com/pricing/details/health-data-services/?msockid=2982a916bc2461731022bd6cbdbd6053#pricing)
96+
97+
You also have a monthly allotment of 50 MB that enables you to try the product for free.
98+
99+
The [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) helps you estimate the cost based on your use case.
58100

59-
When you choose to store documents in Azure Blob Storage, you are charged based on Azure Storage pricing.
101+
When you choose to store documents in Azure Blob Storage, you are charged based on Azure Storage pricing.
60102

61103
## Responsible use of AI
62104

77.4 KB
Loading
71.8 KB
Loading

0 commit comments

Comments
 (0)