Skip to content

Commit da60a42

Browse files
GrahamMThomasGraham Thomasscbedd
authored
Initial Creation of azure-health-deidentification Dataplane SDK (Azure#36041)
* Initial commit of Health.Deidentification dataplane * Use MI instead of SAS * Regenerates with Plaintext * Adds rest of tests * First attempt patch * Patch Attempt #2 * Patch Attempt #3 * Creates base recordings * Fixes sanitizers; Test replay functioning * Creates all sync samples * Creates all async samples * Adds description in readme * Adds tsplocation * Checkpoint * Executes test recording migration * Adds pipeline yamls * Updates ci.yml triggers * Removes ArtifactName from ci.yaml * Fixes analysis failures * Fixes analysis failures 2 * Update sdk/healthdataaiservices/ci.yml Co-authored-by: Scott Beddall <[email protected]> * Updates test.yml * Uniquifier default to false for pipelines * Updates from feedback * Updates from feedback 2 --------- Co-authored-by: Graham Thomas <[email protected]> Co-authored-by: Scott Beddall <[email protected]>
1 parent cf6238a commit da60a42

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+8935
-0
lines changed

.github/CODEOWNERS

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,10 @@
412412
# PRLabel: %Cognitive - Text Analytics
413413
/sdk/textanalytics/ @quentinRobinson @wangyuantao
414414

415+
# ServiceLabel: %Health Deidentification
416+
# PRLabel: %Health Deidentification
417+
/sdk/healthdataaiservices/ @GrahamMThomas @danielszaniszlo
418+
415419
# AzureSdkOwners: @YalinLi0312
416420
# ServiceLabel: %Cognitive - Form Recognizer
417421
# ServiceOwners: @bojunehsu @vkurpad

.vscode/cspell.json

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,7 @@
244244
"guids",
245245
"hanaonazure",
246246
"hdinsight",
247+
"healthdataaiservices",
247248
"heapq",
248249
"hexlify",
249250
"himds",
@@ -402,6 +403,7 @@
402403
"unpad",
403404
"unpadder",
404405
"unpartial",
406+
"uniquifier",
405407
"unredacted",
406408
"unseekable",
407409
"unsubscriptable",
@@ -440,6 +442,7 @@
440442
"BUILDID",
441443
"documentdb",
442444
"chdir",
445+
"radiculopathy",
443446
"reqs",
444447
"rgpy",
445448
"swaggertosdk",
@@ -1840,6 +1843,17 @@
18401843
"words": [
18411844
"dcid"
18421845
]
1846+
},
1847+
{
1848+
"filename": "sdk/healthdataaiservices/azure-health-deidentification/**",
1849+
"words": [
1850+
"deid",
1851+
"deidservices",
1852+
"deidentification",
1853+
"healthdataaiservices",
1854+
"deidentify",
1855+
"deidentified"
1856+
]
18431857
}
18441858
],
18451859
"allowCompoundWords": true
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Release History
2+
3+
## 1.0.0b1 (1970-01-01)
4+
5+
- Initial version
6+
7+
### Features Added
8+
9+
- Initial Code
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Copyright (c) Microsoft Corporation.
2+
3+
MIT License
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
include *.md
2+
include LICENSE
3+
include azure/health/deidentification/py.typed
4+
recursive-include tests *.py
5+
recursive-include samples *.py *.md
6+
include azure/__init__.py
7+
include azure/health/__init__.py
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
2+
3+
# Azure Health Deidentification client library for Python
4+
Azure.Health.Deidentification is a managed service that enables users to tag, redact, or surrogate health data.
5+
6+
## Getting started
7+
8+
### Install the package
9+
10+
```bash
11+
python -m pip install azure-health-deidentification
12+
```
13+
14+
#### Prequisites
15+
16+
- Python 3.8 or later is required to use this package.
17+
- You need an [Azure subscription][azure_sub] to use this package.
18+
- An existing Azure Health Deidentification instance.
19+
#### Create with an Azure Active Directory Credential
20+
To use an [Azure Active Directory (AAD) token credential][authenticate_with_token],
21+
provide an instance of the desired credential type obtained from the
22+
[azure-identity][azure_identity_credentials] library.
23+
24+
To authenticate with AAD, you must first [pip][pip] install [`azure-identity`][azure_identity_pip]
25+
26+
After setup, you can choose which type of [credential][azure_identity_credentials] from azure.identity to use.
27+
As an example, [DefaultAzureCredential][default_azure_credential] can be used to authenticate the client:
28+
29+
Set the values of the client ID, tenant ID, and client secret of the AAD application as environment variables:
30+
`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET`
31+
32+
Use the returned token credential to authenticate the client:
33+
34+
```python
35+
>>> from azure.health.deidentification import DeidentificationClient
36+
>>> from azure.identity import DefaultAzureCredential
37+
>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential())
38+
```
39+
40+
## Key concepts
41+
42+
**Operation Modes**
43+
- Tag: Will return a structure of offset and length with the PHI category of the related text spans.
44+
- Redact: Will return output text with placeholder stubbed text. ex. `[name]`
45+
- Surrogate: Will return output text with synthetic replacements.
46+
- `My name is John Smith`
47+
- `My name is Tom Jones`
48+
49+
**Job Integration with Azure Storage**
50+
Instead of sending text, you can send an Azure Storage Location to the service. We will asynchronously
51+
process the list of files and output the deidentified files to a location of your choice.
52+
53+
Limitations:
54+
- Maximum file count per job: 1000 documents
55+
- Maximum file size per file: 2 MB
56+
57+
## Examples
58+
59+
```python
60+
>>> from azure.health.deidentification import DeidentificationClient
61+
>>> from azure.identity import DefaultAzureCredential
62+
>>> from azure.core.exceptions import HttpResponseError
63+
64+
>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential())
65+
>>> try:
66+
<!-- write test code here -->
67+
except HttpResponseError as e:
68+
print('service responds error: {}'.format(e.response.json()))
69+
70+
```
71+
72+
## Next steps
73+
74+
- Find a bug, or have feedback? Raise an issue with "Health Deidentification" Label.
75+
76+
77+
## Troubleshooting
78+
79+
- **Unabled to Access Source or Target Storage**
80+
- Ensure you create your deid service with a system assigned managed identity
81+
- Ensure your storage account has given permissions to that managed identity
82+
83+
## Contributing
84+
85+
This project welcomes contributions and suggestions. Most contributions require
86+
you to agree to a Contributor License Agreement (CLA) declaring that you have
87+
the right to, and actually do, grant us the rights to use your contribution.
88+
For details, visit https://cla.microsoft.com.
89+
90+
When you submit a pull request, a CLA-bot will automatically determine whether
91+
you need to provide a CLA and decorate the PR appropriately (e.g., label,
92+
comment). Simply follow the instructions provided by the bot. You will only
93+
need to do this once across all repos using our CLA.
94+
95+
This project has adopted the
96+
[Microsoft Open Source Code of Conduct][code_of_conduct]. For more information,
97+
see the Code of Conduct FAQ or contact [email protected] with any
98+
additional questions or comments.
99+
100+
<!-- LINKS -->
101+
[code_of_conduct]: https://opensource.microsoft.com/codeofconduct/
102+
[authenticate_with_token]: https://docs.microsoft.com/azure/cognitive-services/authentication?tabs=powershell#authenticate-with-an-authentication-token
103+
[azure_identity_credentials]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#credentials
104+
[azure_identity_pip]: https://pypi.org/project/azure-identity/
105+
[default_azure_credential]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#defaultazurecredential
106+
[pip]: https://pypi.org/project/pip/
107+
[azure_sub]: https://azure.microsoft.com/free/
108+
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"AssetsRepo": "Azure/azure-sdk-assets",
3+
"AssetsRepoPrefixPath": "python",
4+
"TagPrefix": "python/healthdataaiservices/azure-health-deidentification",
5+
"Tag": "python/healthdataaiservices/azure-health-deidentification_a8eed6d322"
6+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# coding=utf-8
2+
# --------------------------------------------------------------------------
3+
# Copyright (c) Microsoft Corporation. All rights reserved.
4+
# Licensed under the MIT License. See License.txt in the project root for license information.
5+
# Code generated by Microsoft (R) Python Code Generator.
6+
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
7+
# --------------------------------------------------------------------------
8+
9+
from ._client import DeidentificationClient
10+
from ._version import VERSION
11+
12+
__version__ = VERSION
13+
14+
try:
15+
from ._patch import __all__ as _patch_all
16+
from ._patch import * # pylint: disable=unused-wildcard-import
17+
except ImportError:
18+
_patch_all = []
19+
from ._patch import patch_sdk as _patch_sdk
20+
21+
__all__ = [
22+
"DeidentificationClient",
23+
]
24+
__all__.extend([p for p in _patch_all if p not in __all__])
25+
26+
_patch_sdk()

0 commit comments

Comments
 (0)