Skip to content

Commit d95017c

Browse files
author
Jill Grant
authored
Merge pull request #285779 from kimiamavon-msft/patch-15
Patch 15 - JillGrant615 edits included
2 parents d718c5f + bc71ba3 commit d95017c

File tree

2 files changed

+139
-3
lines changed

2 files changed

+139
-3
lines changed

articles/healthcare-apis/deidentification/toc.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ items:
66
items:
77
- name: What is the de-identification service?
88
href: overview.md
9+
- name: Transparency Note
10+
href: transparency-note.md
911
- name: Quickstarts
1012
expanded: true
1113
items:
@@ -22,13 +24,11 @@ items:
2224
href: manage-access-rbac.md
2325
- name: Reference
2426
items:
25-
- name: REST API
26-
href: /rest/api/healthdata
2727
- name: .NET SDK
2828
href: /dotnet/api/overview/azure/healthdeidentification
2929
- name: Python SDK
3030
href: /python/api/overview/azure/health-deidentification
3131
- name: Java SDK
3232
href: /java/api/overview/azure/health-deidentification
3333
- name: JavaScript SDK
34-
href: /javascript/api/overview/azure/health-deidentification
34+
href: /javascript/api/overview/azure/health-deidentification
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
title: The Azure Health Data Services de-identification service (preview) transparency note
3+
description: The basics of Azure Health Data Services’ de-identification service and Responsible AI
4+
author: kimiamavon
5+
ms.service: azure-health-data-services
6+
ms.subservice: deidentification-service
7+
ms.topic: legal
8+
ms.date: 8/16/2024
9+
ms.author: kimiamavon
10+
---
11+
# The basics of Azure Health Data Services’ de-identification service
12+
13+
Azure Health Data Services’ de-identification service is an API that uses natural language processing techniques to find and label, redact, or surrogate Protected Health Information (PHI) in unstructured text. The service can be used for diverse types of unstructured health documents, including discharge summaries, clinical notes, clinical trials, messages, and more. The service uses machine learning to identify PHI, including HIPAA’s 18 identifiers, using the “TAG” operation. The redaction and surrogation operations replace these PHI values with a tag of the entity type or a surrogate, or pseudonym.
14+
15+
## Key terms
16+
| Term | Definition |
17+
| :--- | :------ |
18+
| Surrogation | The replacement of data using a pseudonym or alternative token. |
19+
| Tag | The action or process of detecting words and phrases mentioned in unstructured text using named entity recognition. |
20+
| Consistent Surrogation | The process of replacing PHI values with alternative non-PHI data, such that the same PHI values are repeatedly replaced with consistent values. This may be within the same document or across documents for a given organization. |
21+
22+
## Capabilities
23+
### System behavior
24+
To use the de-identification service, the input raw, unstructured text, can be sent synchronously one at a time or asynchronously as a batch. For the synchronous call, the API output is handled in your application. For the batch use case, the API call requires a source and target file location in Azure blob storage. Three possible operations are available through the API: "Tag," "Redact," or "Surrogate." Tag returns PHI values detected with named entity recognition. Redact returns the input text, except with the entity type replacing the PHI values. Surrogation returns the input text, except with randomly selected identifiers, or the same entity type, replacing the PHI values. Consistent surrogation is available across documents using the batch API.
25+
26+
## Use cases
27+
### Intended uses
28+
The de-identification service was built specifically for health and life sciences organizations within the United States subject to HIPAA. We do not recommend this service for non-medical applications or for applications other than English. Some common customer motivations for using the de-identification service include:
29+
30+
- Developing de-identified data for a test or research environment
31+
- Developing de-identified datasets for data analytics without revealing confidential information
32+
- Training machine learning models on private data, which is especially important for generative AI
33+
- Sharing data across collaborating institutions
34+
35+
## Considerations when choosing other use cases
36+
We encourage customers to use the de-identification service in their innovative solutions or applications. However, de-identified data alone or in combination with other information may reveal patients' identities. As such, customers creating, using, and sharing de-identified data should do so responsibly.
37+
38+
## Disclaimer
39+
Results derived from the de-identification service vary based on factors such as data input and functions selected. Microsoft is unable to evaluate the output of the de-identification service to determine the acceptability of any use cases or compliance needs. Outputs from the de-identification service are not guaranteed to meet any specific legal, regulatory, or compliance requirements. Please see the limitations before using the de-identification service.
40+
41+
## Suggested use
42+
The de-identification service offers three operations: Tag, Redact, and Surrogation. When appropriate, we recommend users deploy surrogation over redaction. Surrogation is useful when the system fails to identify true PHI. The real value is hidden among surrogates, or stand-in-data. The data is "hiding in plain sight," unlike redaction. The service also offers consistent surrogation, or a continuous mapping of surrogate replacements across documents. Consistent surrogation is available by submitting files in batches to the API using the asynchronous endpoint. We recommend limiting the batch size as consistent surrogation over a large number of records degrades the privacy of the document.
43+
44+
## Technical limitations, operational factors, and ranges
45+
There are various cases that would impact the de-identification service’s performance.
46+
47+
- Coverage: Unstructured text may contain information that reveals identifying characteristics about an individual that alone, or in combination with external information, reveals the identity of the individual. For example, a clinical record could state that a patient is the only known living person diagnosed with a particular rare disease. The unstructured text alone, or in combination with external information, may reveal that patient’s clinical records.
48+
- Languages: Currently, the de-identification service is enabled for English text only.
49+
- Spelling: Incorrect spelling might affect the output. If a word or the surrounding words are misspelled the system might or might not have enough information to recognize that the text is PHI.
50+
- Data Format: The service performs best on unstructured text, such as clinical notes, transcripts, or messages. Structured text without context of surrounding words may or may not have enough information to recognize that the text is PHI.
51+
- Performance: Potential error types are outlined in the System performance section.
52+
- Surrogation: As stated above, the service offers consistent surrogation, or a continuous mapping of surrogate replacements across documents. Consistent surrogation is available by submitting files in batches to the API using the asynchronous endpoint. Submitting the same files in different batches or through the real-time endpoint results in different surrogates used in place of the PHI values.
53+
- Compliance: The de-identification service's performance is dependent on the user’s data. The service does not guarantee compliance with HIPAA’s Safe Harbor method or any other privacy methods. We encourage users to obtain appropriate legal review of your solution, particularly for sensitive or high-risk applications.
54+
55+
## System performance
56+
The de-identification service might have both false positive errors and false negative errors. An example of
57+
a false positive is tag, redaction, or surrogation of a word or token that is not PHI. An example of a false
58+
negative is the service’s failure to tag, redact, or surrogate a word or token that is truly PHI.
59+
60+
| Classification | Example | Tag Example | Explanation |
61+
| :---------------- | :------ | :---- | :---- |
62+
| False Positive | Patient reports allergy to cat hair. | Patient reports allergy to DOCTOR hair. | This is an example of a false positive, as "cat" in this context isn't PHI. "Cat" refers to an animal, and not a name. |
63+
| False Negative | Jane reports allergy to cat hair. | Jane reports allergy to cat hair. | The system failed to identify Jane as a name. |
64+
| True Positive | Jane reports allergy to cat hair. | PATIENT reports allergy to cat hair. | The system correctly identified Jane as a name. |
65+
| True Negative | Patient reports allergy to cat hair. | Patient reports allergy to cat hair. | The system correctly identified that "cat" is not PHI. |
66+
67+
When evaluating candidate models for our service, we strive to reduce false negatives, the most important
68+
metric from a privacy perspective.
69+
70+
The de-identification model is trained and evaluated on diverse types of unstructured medical documents, including clinical notes and transcripts. Our training data includes synthetically generated data, open datasets, and commercially obtained datasets with patient consent. We do not retain or use customer data to improve the service. Even though internal tests demonstrate the model’s potential to generalize to different populations and locales, you should carefully evaluate your model in the context of your intended use.
71+
72+
## Best practices for improving system performance
73+
There are numerous best practices to improve the de-identification services’ performance:
74+
75+
- Surrogation: When appropriate, we recommend users deploy surrogation over redaction. This is because if the system fails to identify true PHI, then the real value would be hidden among surrogates, or stand-in-data. The data is "hiding in plain sight."
76+
- Languages: Currently, the de-identification service is enabled for English text only. Code-switching or using other languages results in worse performance.
77+
- Spelling: Correct spelling improves performance. If a word or the surrounding words are misspelled the system might or might not have enough information to recognize that the text is PHI.
78+
- Data Format: The service performs best on unstructured text, such as clinical notes, transcripts, or messages. Structured text without context of surrounding words may or may not have enough information to recognize that the text is PHI.
79+
80+
## Evaluation of the de-identification service
81+
### Evaluation methods
82+
Our de-identification system is evaluated in terms of its ability to detect PHI in incoming text, and secondarily our ability to replace that PHI with synthetic data that preserves the semantics of the incoming text.
83+
84+
### PHI detection
85+
Our system focuses on its ability to successfully identify and remove all PHI in incoming text (recall). Secondary metrics include precision, which tells us how often we think something is PHI when it is not, as well as how often we identify both the type and location of PHI in text. As a service that is typically used to mitigate risk associated with PHI, the primary release criteria we use is recall. Recall is measured on a number of academic and internal datasets written in English and typically covers medical notes and conversations across various medical specialties. Our internal metrics do not include non-PHI text and are measured at an entity level with fuzzy matching such that the true text span need not match exactly to the detected one.
86+
87+
Our service goal is to maintain recall greater than 95%.
88+
89+
### PHI replacement
90+
An important consideration for a system such as ours is that we produce synthetic data that looks like the original data source in terms of plausibility and readability. To this end, we evaluate how often our system produces replacements that can be interpreted as the same type as the original. This is an important intermediate metric that is a predictor of how well downstream applications could make sense of the de-identified data.
91+
92+
Secondarily, we internally study the performance of machine learning models trained on original vs. de-identified data. We do not publish the results of these studies, however we have found that using surrogation for machine learning applications can greatly improve the downstream ML model performance. As every machine learning application is different and these results may not translate across applications depending on their sensitivity to PHI, we encourage our customers who are using machine learning to study the applicability of de-identified data for machine learning purposes.
93+
94+
### Evaluation results
95+
Our system currently meets our benchmarks for recall and precision on our academic evaluation sets.
96+
97+
### Limitations
98+
The data and measurement that we perform represents most healthcare applications involving text conducted in English. In doing so, our system is optimized to perform well on medical data, and we believe represent the typical usage including length, encoding, formatting, markup, style, and content. Our system performs well for many types of text, but may underperform if the incoming data differs with respect to any of these
99+
metrics. Care has been taken in the system to analyze text in large chunks, such that the context of a phrase is used to infer if it is PHI or not. We do not recommend using this system in a real-time / transcription application, where the caller may only have access to the context before a PHI utterance. Our system relies on both pre- and post- text for context.
100+
101+
Our training algorithm leverages large foundational models that are trained on large amounts of text from all sources, including nonmedical sources. While every reasonable effort is employed to ensure that the results of these models are in line with the domain and intended usage of the application, these systems may not perform well in all circumstances for all data. We do not recommend this system for nonmedical applications or for applications other than English.
102+
103+
### Fairness considerations
104+
The surrogation system replaces names through random selection. This may result in a distribution of names more diverse than the original dataset. The surrogation system also strives to not include offensive content in results. The surrogation list has been evaluated by a content-scanning tool designed to check for sensitive geopolitical terms, profanity, and trademark terms in Microsoft products. At this time, we do not support languages other than English but plan to support multilingual input in the future.
105+
Our model has been augmented to provide better than average performance for all cultures. We carefully inject data into our training process that represents many ethnicities in an effort to provide equal performance in PHI removal for all data, regardless of source. The service makes no guarantees implied or explicit with respect to its interpretation of data. Any user of this service should make no inferences about associations or correlations between tagged data elements such as: gender, age, location, language, occupation, illness, income level, marital status, disease or disorder, or any other demographic information.
106+
107+
## Evaluating and integrating the de-identification service for your use
108+
Microsoft wants to help you responsibly deploy the de-identification service. As part of our commitment
109+
to developing responsible AI, we urge you to consider the following factors:
110+
111+
- Understand what it can do: Fully assess the capabilities of the de-identification service to understand its capabilities and limitations. Understand how it will perform in your scenario, context, and on your specific data set.
112+
- Test with real, diverse data: Understand how the de-identification service will perform in your scenario by thoroughly testing it by using real-life conditions and data that reflect the diversity in your users, geography, and deployment contexts. Small datasets, synthetic data, and tests that don't reflect your end-to-end scenario are unlikely to sufficiently represent your production performance.
113+
- Respect an individual's right to privacy: Only collect or use data and information from individuals for lawful and justifiable purposes. Use only the data and information that you have consent to use or are legally permitted to use.
114+
- Language: The de-identification service, at this time, is only built for English. Using other languages will impact the performance of the model.
115+
- Legal review: Obtain appropriate legal review of your solution, particularly if you will use it in sensitive or high-risk applications. Understand what restrictions you might need to work within and any risks that need to be mitigated prior to use. It is your responsibility to mitigate such risks and resolve any issues that might come up.
116+
- System review: If you plan to integrate and responsibly use an AI-powered product or feature into an existing system for software or customer or organizational processes, take time to understand how each part of your system will be affected. Consider how your AI solution aligns with Microsoft Responsible AI principles.
117+
- Human in the loop: Keep a human in the loop and include human oversight as a consistent pattern area to explore. This means constant human oversight of the AI-powered product or feature and
118+
ensuring the role of humans in making any decisions that are based on the model’s output. To prevent harm and to manage how the AI model performs, ensure that humans have a way to intervene in the solution in real time.
119+
- Security: Ensure that your solution is secure and that it has adequate controls to preserve the integrity of your content and prevent unauthorized access.
120+
- Customer feedback loop: Provide a feedback channel that users and individuals can use to report issues with the service after it's deployed. After you deploy an AI-powered product or feature, it requires ongoing monitoring and improvement. Have a plan and be ready to implement feedback and suggestions for improvement.
121+
122+
## Learn more about responsible AI
123+
- [Microsoft AI principals](https://www.microsoft.com/ai/responsible-ai)
124+
- [Microsoft responsible AI resources](https://www.microsoft.com/ai/responsible-ai-resources)
125+
126+
## Learn more about the de-identification service
127+
* Explore [Microsoft Cloud for Healthcare](https://www.microsoft.com/industry/health/microsoft-cloud-for-healthcare)
128+
* Explore [Azure Health Data Services](https://azure.microsoft.com/products/health-data-services/)
129+
130+
## About this document
131+
© 2023 Microsoft Corporation. All rights reserved. This document is provided "as-is" and for informational purposes only. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. Some examples are for illustration only and are fictitious. No real association is intended or inferred.
132+
This document is not intended to be, and should not be construed as providing legal advice. The jurisdiction in which you’re operating may have various regulatory or legal requirements that apply to your AI system. Consult a legal specialist if you are uncertain about laws or regulations that might apply to your system, especially if you think those might impact these recommendations. Be aware that not all of these recommendations and resources will be appropriate for every scenario, and conversely, these recommendations and resources may be insufficient for some scenarios.
133+
134+
Published: September 30, 2023
135+
136+
Last updated: August 16, 2024

0 commit comments

Comments
 (0)