Skip to content

Commit 5c816fc

Browse files
authored
Merge pull request #300112 from kimiamavon-msft/patch-26
async quickstart
2 parents 9f703fe + 3cc564f commit 5c816fc

File tree

2 files changed

+181
-1
lines changed

2 files changed

+181
-1
lines changed
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
title: De-identify multiple documents with the de-identification service in python
3+
description: "Learn how to bulk de-identify documents with the asynchronous de-identification service in python."
4+
author: kimiamavon-msft
5+
ms.author: kimiamavon
6+
ms.service: azure-health-data-services
7+
ms.subservice: deidentification-service
8+
ms.topic: tutorial
9+
ms.date: 05/01/2025
10+
11+
#customer intent: As an IT admin, I want to de-identify multiple documents with the de-identification service in python
12+
13+
---
14+
15+
# De-identify multiple documents with the asynchronous de-identification service
16+
17+
The Azure Health Data Services de-identification service can de-identify documents in Azure Storage via an asynchronous job. If you have many documents that you would like
18+
to de-identify, using a job is a good option. Jobs also provide consistent surrogation, meaning that surrogate values in the de-identified output will match across
19+
all documents. For more information about de-identification, including consistent surrogation, see [What is the de-identification service?](overview.md)
20+
21+
When you choose to store documents in Azure Blob Storage, you're charged based on Azure Storage pricing. This cost isn't included in the
22+
de-identification service pricing. [Explore Azure Blob Storage pricing](https://azure.microsoft.com/pricing/details/storage/blobs).
23+
24+
In this tutorial, you:
25+
26+
27+
* Create a storage account and container
28+
* Upload a sample document
29+
* Grant the de-identification service access
30+
* Configure network isolation
31+
32+
## Prerequisites
33+
34+
* An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
35+
* A de-identification service with system-assigned managed identity. [Deploy the de-identification service](quickstart.md).
36+
37+
## Open Azure CLI
38+
39+
Install [Azure CLI](/cli/azure/install-azure-cli) and open your terminal of choice. In this tutorial, we're using PowerShell.
40+
41+
## Create a storage account and container
42+
1. Set your context, substituting the subscription name containing your de-identification service for the `<subscription_name>` placeholder:
43+
```powershell
44+
az account set --subscription "<subscription_name>"
45+
```
46+
1. Save a variable for the resource group, substituting the resource group containing your de-identification service for the `<resource_group>` placeholder:
47+
```powershell
48+
$ResourceGroup = "<resource_group>"
49+
```
50+
1. Create a storage account, providing a value for the `<storage_account_name>` placeholder:
51+
```powershell
52+
$StorageAccountName = "<storage_account_name>"
53+
$StorageAccountId = $(az storage account create --name $StorageAccountName --resource-group $ResourceGroup --sku Standard_LRS --kind StorageV2 --min-tls-version TLS1_2 --allow-blob-public-access false --query id --output tsv)
54+
```
55+
1. Assign yourself a role to perform data operations on the storage account:
56+
```powershell
57+
$UserId = $(az ad signed-in-user show --query id -o tsv)
58+
az role assignment create --role "Storage Blob Data Contributor" --assignee $UserId --scope $StorageAccountId
59+
```
60+
1. Create a container to hold your sample document:
61+
```powershell
62+
az storage container create --account-name $StorageAccountName --name deidtest --auth-mode login
63+
```
64+
## Upload a sample document
65+
Next, you upload a document that contains synthetic PHI:
66+
```powershell
67+
$DocumentContent = "The patient came in for a visit on 10/12/2023 and was seen again November 4th at Contoso Hospital."
68+
az storage blob upload --data $DocumentContent --account-name $StorageAccountName --container-name deidtest --name deidsample.txt --auth-mode login
69+
```
70+
71+
## Grant the de-identification service access to the storage account
72+
73+
In this step, you grant the de-identification service's system-assigned managed identity role-based access to the container. You grant the **Storage Blob
74+
Data Contributor** role because the de-identification service will both read the original document and write de-identified output documents. Substitute the name of
75+
your de-identification service for the `<deid_service_name>` placeholder:
76+
```powershell
77+
$DeidServicePrincipalId=$(az resource show -n <deid_service_name> -g $ResourceGroup --resource-type microsoft.healthdataaiservices/deidservices --query identity.principalId --output tsv)
78+
az role assignment create --assignee $DeidServicePrincipalId --role "Storage Blob Data Contributor" --scope $StorageAccountId
79+
```
80+
To verify that the de-identification service has access to the storage account, you can check on the Azure portal under <b>storage accounts</b>. Under the <b>Storage center</b> and <b>Resources<b/> tab, click your storage account name. Select <b>Access control (IAM)</b> and in the search bar, search for the name of your de-identification service ($ResourceGroup).
81+
82+
## Configure network isolation on the storage account
83+
Next, you update the storage account to disable public network access and only allow access from trusted Azure services such as the de-identification service.
84+
After running this command, you won't be able to view the storage container contents without setting a network exception.
85+
Learn more at [Configure Azure Storage firewalls and virtual networks](/azure/storage/common/storage-network-security).
86+
87+
```powershell
88+
az storage account update --name $StorageAccountName --public-network-access Disabled --bypass AzureServices
89+
```
90+
91+
## Use the python SDK
92+
The code below contains a sample from the [Azure Health Deidentification SDK for Python](https://learn.microsoft.com/python/api/overview/azure/health-deidentification?view=azure-python).
93+
94+
```Bash
95+
96+
"""
97+
FILE: deidentify_documents_async.py
98+
99+
DESCRIPTION:
100+
This sample demonstrates a basic scenario of de-identifying documents in Azure Storage.
101+
Taking a container URI and an input prefix, the sample will create a job and wait for the job to complete.
102+
103+
USAGE:
104+
python deidentify_documents_async.py
105+
106+
Set the environment variables with your own values before running the sample:
107+
1) endpoint - the service URL endpoint for a de-identification service.
108+
2) storage_location - an Azure Storage container endpoint, like "https://<storageaccount>.blob.core.windows.net/<container>".
109+
3) INPUT_PREFIX - the prefix of the input document name(s) in the container.
110+
For example, providing "folder1" would create a job that would process documents like "https://<storageaccount>.blob.core.windows.net/<container>/folder1/document1.txt".
111+
"""
112+
113+
114+
import asyncio
115+
from azure.core.polling import AsyncLROPoller
116+
from azure.health.deidentification.aio import DeidentificationClient
117+
from azure.health.deidentification.models import (
118+
DeidentificationJob,
119+
SourceStorageLocation,
120+
TargetStorageLocation,
121+
)
122+
from azure.identity.aio import DefaultAzureCredential
123+
import os
124+
import uuid
125+
126+
127+
async def deidentify_documents_async():
128+
endpoint = "<YOUR SERVICE URL HERE>" ### Replace
129+
storage_location = "https://<CONTAINER NAME>.blob.core.windows.net/deidtest/" ### Replace <CONTAINER NAME>
130+
inputPrefix = "deidsample"
131+
outputPrefix = "_output"
132+
133+
credential = DefaultAzureCredential()
134+
client = DeidentificationClient(endpoint, credential)
135+
136+
jobname = f"sample-job-{uuid.uuid4().hex[:8]}"
137+
138+
job = DeidentificationJob(
139+
source_location=SourceStorageLocation(
140+
location=storage_location,
141+
prefix=inputPrefix,
142+
),
143+
target_location=TargetStorageLocation(location=storage_location, prefix=outputPrefix, overwrite=True),
144+
)
145+
146+
async with client:
147+
lro: AsyncLROPoller = await client.begin_deidentify_documents(jobname, job)
148+
finished_job: DeidentificationJob = await lro.result()
149+
150+
await credential.close()
151+
152+
print(f"Job Name: {finished_job.job_name}")
153+
print(f"Job Status: {finished_job.status}") # Succeeded
154+
print(f"File Count: {finished_job.summary.total_count if finished_job.summary is not None else 0}")
155+
156+
157+
async def main():
158+
await deidentify_documents_async()
159+
160+
161+
if __name__ == "__main__":
162+
asyncio.run(main())
163+
164+
165+
```
166+
167+
## Clean up resources
168+
Once you're done with the storage account, you can delete the storage account and role assignments:
169+
```powershell
170+
az role assignment delete --assignee $DeidServicePrincipalId --role "Storage Blob Data Contributor" --scope $StorageAccountId
171+
az role assignment delete --assignee $UserId --role "Storage Blob Data Contributor" --scope $StorageAccountId
172+
az storage account delete --ids $StorageAccountId --yes
173+
```
174+
175+
## Next step
176+
177+
> [!div class="nextstepaction"]
178+
> [Quickstart: Azure Health De-identification client library for .NET](quickstart-sdk-net.md)

articles/healthcare-apis/deidentification/toc.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,10 @@ items:
2727
displayName: Resource Manager
2828
- name: Azure Health De-identification client library for .NET
2929
href: quickstart-sdk-net.md
30-
- name: Python
30+
- name: Python - synchronous
3131
href: quickstart-python.md
32+
- name: Python - asynchronous
33+
href: quickstart-asynchronous-python.md
3234
- name: Tutorials
3335
expanded: true
3436
items:

0 commit comments

Comments
 (0)