Skip to content

Commit 2d1d40c

Browse files
jovinson-msjosiahvinsonCopilot
authored
Deid text encoding documentation (Azure#51201)
* Adding string encoding documentation * Adding text encoding doc * Update sdk/healthdataaiservices/Azure.Health.Deidentification/README.md Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Josiah Vinson <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 954b96f commit 2d1d40c

File tree

1 file changed

+37
-2
lines changed
  • sdk/healthdataaiservices/Azure.Health.Deidentification

1 file changed

+37
-2
lines changed

sdk/healthdataaiservices/Azure.Health.Deidentification/README.md

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,12 @@ Given an input text, the de-identification service can perform three main operat
5959

6060
For more information about customizing the redaction format, see [Tutorial: Use a custom redaction format with the de-identification service][deid_redaction_format].
6161

62+
### String Encoding
63+
When using the `Tag` operation, the service will return the locations of PHI entities in the input text. These locations will be represented as offsets and lengths, each of which is a [StringIndex][string_index] containing
64+
three properties corresponding to three different text encodings. **.NET applications should use the `Utf16` property.**
65+
66+
For more on text encoding, see [Character encoding in .NET][character_encoding].
67+
6268
### De-identification Methods
6369
There are two methods of interacting with the de-identification service. You can send text directly, or you can create jobs
6470
to de-identify documents in Azure Storage.
@@ -72,8 +78,33 @@ string outputString = result.Value.OutputText;
7278
Console.WriteLine(outputString); // Hello, Tom!
7379
```
7480

75-
To learn about prerequisites and configuration options for de-identifying documents in Azure Storage, see [Tutorial: Configure Azure Storage to de-identify documents][deid_configure_storage].
76-
Once you have configured your storage account, you can create a job to de-identify documents in a container.
81+
To de-identify documents in Azure Storage, you'll need a storage account with a container to which the
82+
de-identification service has been granted an appropriate role. See [Tutorial: Configure Azure Storage to de-identify documents][deid_configure_storage]
83+
for prerequisites and configuration options. You can upload the files in the [test data folder][test_data] as blobs, like: `https://<storageaccount>.blob.core.windows.net/<container>/example_patient_1/doctor_dictation.txt`.
84+
85+
You can create jobs to de-identify documents in the source Azure Storage account and container with an optional input prefix. If there's no input prefix, all blobs in the container will be de-identified. Azure Storage blobs can use `/` in the blob name to emulate a folder or directory layout. For more on blob naming, see [Naming and Referencing Containers, Blobs, and Metadata][blob_names]. The files you've uploaded can be de-identified by providing `example_patient_1` as the input prefix:
86+
```
87+
<container>/
88+
├── example_patient_1/
89+
└──doctor_dictation.txt
90+
└──row-2-data.txt
91+
└──visit-summary.txt
92+
```
93+
94+
Your target Azure Storage account and container where documents will be written can be the same as the source, or a different account or container. In the examples below, the source and target account and container are the same. You can specify an output prefix to indicate where the job's output documents should be written (defaulting to `_output`). Each document processed by the job will have the same relative blob name with the input prefix replaced by the output prefix:
95+
```
96+
<container>/
97+
├── example_patient_1/
98+
└──doctor_dictation.txt
99+
└──row-2-data.txt
100+
└──visit-summary.txt
101+
├── _output/
102+
└──doctor_dictation.txt
103+
└──row-2-data.txt
104+
└──visit-summary.txt
105+
```
106+
107+
Create a job to de-identify documents:
77108
```C# Snippet:AzHealthDeidSample2_CreateJob
78109
DeidentificationJob job = new()
79110
{
@@ -137,6 +168,8 @@ additional questions or comments.
137168
[product_documentation]: https://learn.microsoft.com/azure/healthcare-apis/deidentification/
138169
[docs]: https://learn.microsoft.com/dotnet/api/azure.health.deidentification
139170
[deid_nuget]: https://www.nuget.org/packages/Azure.Health.Deidentification
171+
[string_index]: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/healthdataaiservices/Azure.Health.Deidentification/src/Generated/StringIndex.cs
172+
[character_encoding]: https://learn.microsoft.com/dotnet/standard/base-types/character-encoding-introduction
140173
[deid_redaction_format]: https://learn.microsoft.com/azure/healthcare-apis/deidentification/redaction-format
141174
[azure_subscription]: https://azure.microsoft.com/free/
142175
[deid_quickstart]: https://learn.microsoft.com/azure/healthcare-apis/deidentification/quickstart
@@ -148,3 +181,5 @@ additional questions or comments.
148181
[azure_portal]: https://ms.portal.azure.com
149182
[github_issue_label]: https://github.com/Azure/azure-sdk-for-net/labels/Health%20Deidentification
150183
[samples]: https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/healthdataaiservices/Azure.Health.Deidentification/samples/README.md
184+
[blob_names]: https://learn.microsoft.com/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#blob-names
185+
[test_data]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/healthdataaiservices/azure-health-deidentification/tests/data

0 commit comments

Comments
 (0)