Skip to content

Commit d5174be

Browse files
authored
Merge pull request #198851 from jboback/PII
[Cog Svcs] Pii
2 parents 9343939 + 945f9c0 commit d5174be

File tree

7 files changed

+266
-31
lines changed

7 files changed

+266
-31
lines changed

articles/cognitive-services/language-service/concepts/model-lifecycle.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ Use the table below to find which model versions are supported by each feature:
5555
| Entity Linking | `2021-06-01` | `2021-06-01` | |
5656
| Named Entity Recognition (NER) | `2021-06-01` | `2021-06-01` | |
5757
| Personally Identifiable Information (PII) detection | `2020-07-01`, `2021-01-15` | `2021-01-15` | |
58+
| PII detection for conversations (Preview) | `2022-05-15-preview` | | `2022-05-15-preview` |
5859
| Question answering | `2021-10-01` | `2021-10-01` | |
5960
| Text Analytics for health | `2021-05-15`, `2022-03-01` | `2022-03-01` | |
6061
| Key phrase extraction | `2021-06-01` | `2021-06-01` | |

articles/cognitive-services/language-service/personally-identifiable-information/concepts/conversations-entity-categories.md

Lines changed: 3 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ This category contains the following entity:
191191

192192
Any credit card number, any security code on the back, or the expiration date is considered as PII.
193193

194-
To get this entity category, add `CreditCardNumber` to the `pii-categories` parameter. `CreditCardNumber` will be returned in the API response if detected.
194+
To get this entity category, add `CreditCard` to the `pii-categories` parameter. `CreditCard` will be returned in the API response if detected.
195195

196196
:::column-end:::
197197
:::column span="2":::
@@ -202,27 +202,6 @@ This category contains the following entity:
202202
:::column-end:::
203203
:::row-end:::
204204

205-
## Government and country/region-specific identification
205+
## Next steps
206206

207-
### United States
208-
209-
:::row:::
210-
:::column span="":::
211-
**Entity**
212-
213-
U.S. Social Security Number (SSN)
214-
215-
:::column-end:::
216-
:::column span="2":::
217-
**Details**
218-
219-
To get this entity category, add `USSocialSecurityNumber` to the `pii-categories` parameter. `USSocialSecurityNumber` will be returned in the API response if detected.
220-
221-
:::column-end:::
222-
:::column span="":::
223-
**Supported document languages**
224-
225-
`en`
226-
227-
:::column-end:::
228-
:::row-end:::
207+
[How to detect PII in conversations](../how-to-call-for-conversations.md)

articles/cognitive-services/language-service/personally-identifiable-information/how-to-call-for-conversations.md

Lines changed: 241 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ ms.service: cognitive-services
99
ms.subservice: language-service
1010
ms.topic: how-to
1111
ms.date: 05/10/2022
12-
ms.author: bidishac
13-
ms.custom:
12+
ms.author: aahi
13+
ms.reviewer: bidishac
1414
---
1515

1616

@@ -25,9 +25,13 @@ For transcripts, the API also enables redaction of audio segments, which contain
2525

2626
By default, this feature will use the latest available AI model on your input. You can also configure your API requests to use a specific [model version](../concepts/model-lifecycle.md).
2727

28-
### Input languages
28+
### Language support
2929

30-
Currently the conversational PII preview API only supports English language and is available in the following three regions East US, North Europe and UK south.
30+
Currently the conversational PII preview API only supports English language.
31+
32+
### Region support
33+
34+
Currently the conversational PII preview API supports the following regions: East US, North Europe and UK south.
3135

3236
## Submitting data
3337

@@ -41,10 +45,243 @@ The API will attempt to detect all the [defined entity categories](concepts/conv
4145

4246
For spoken transcripts, the entities detected will be returned on the `redactionSource` parameter value provided. Currently, the supported values for `redactionSource` are `text`, `lexical`, `itn`, and `maskedItn` (which maps to Microsoft Speech to Text API's `display`\\`displayText`, `lexical`, `itn` and `maskedItn` format respectively). Additionally, for the spoken transcript input, this API will also provide audio timing information to empower audio redaction. For using the audioRedaction feature, use the optional `includeAudioRedaction` flag with `true` value. The audio redaction is performed based on the lexical input format.
4347

48+
4449
## Getting PII results
4550

4651
When you get results from PII detection, you can stream the results to an application or save the output to a file on the local system. The API response will include [recognized entities](concepts/conversations-entity-categories.md), including their categories and subcategories, and confidence scores. The text string with the PII entities redacted will also be returned.
4752

53+
## Examples
54+
55+
# [Client libraries (Azure SDK)](#tab/client-libraries)
56+
57+
1. Go to your resource overview page in the [Azure portal](https://portal.azure.com/#home)
58+
59+
2. From the menu on the left side, select **Keys and Endpoint**. You will need one of the keys and the endpoint to authenticate your API requests.
60+
61+
3. Download and install the client library package for your language of choice:
62+
63+
|Language |Package version |
64+
|---------|---------|
65+
|.NET | [5.2.0-beta.2](https://www.nuget.org/packages/Azure.AI.TextAnalytics/5.2.0-beta.2) |
66+
|Python | [5.2.0b2](https://pypi.org/project/azure-ai-textanalytics/5.2.0b2/) |
67+
68+
4. After you've installed the client library, use the following samples on GitHub to start calling the API.
69+
70+
* [C#](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/textanalytics/Azure.AI.TextAnalytics/samples/Sample9_RecognizeCustomEntities.md)
71+
* [Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/lro/RecognizeCustomEntities.java)
72+
* [JavaScript](https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/textanalytics/ai-text-analytics/samples/v5/javascript/customText.js)
73+
* [Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/textanalytics/azure-ai-textanalytics/samples/sample_recognize_custom_entities.py)
74+
75+
5. See the following reference documentation for more information on the client, and return object:
76+
77+
* [C#](/dotnet/api/azure.ai.textanalytics?view=azure-dotnet-preview&preserve-view=true)
78+
* [Java](/java/api/overview/azure/ai-textanalytics-readme?view=azure-java-preview&preserve-view=true)
79+
* [JavaScript](/javascript/api/overview/azure/ai-text-analytics-readme?view=azure-node-preview&preserve-view=true)
80+
* [Python](/python/api/azure-ai-textanalytics/azure.ai.textanalytics?view=azure-python-preview&preserve-view=true)
81+
82+
# [REST API](#tab/rest-api)
83+
84+
## Submit transcripts using speech-to-text
85+
86+
Use the following example if you have conversations transcribed using the Speech service's [speech-to-text](../../Speech-Service/speech-to-text.md) feature:
87+
88+
```bash
89+
curl -i -X POST https://your-language-endpoint-here/language/analyze-conversations?api-version=2022-05-15-preview \
90+
-H "Content-Type: application/json" \
91+
-H "Ocp-Apim-Subscription-Key: your-key-here" \
92+
-d \
93+
'
94+
{
95+
"displayName": "Analyze conversations from xxx",
96+
"analysisInput": {
97+
"conversations": [
98+
{
99+
"id": "23611680-c4eb-4705-adef-4aa1c17507b5",
100+
"language": "en",
101+
"modality": "transcript",
102+
"conversationItems": [
103+
{
104+
"participantId": "agent_1",
105+
"id": "8074caf7-97e8-4492-ace3-d284821adacd",
106+
"text": "Good morning.",
107+
"lexical": "good morning",
108+
"itn": "good morning",
109+
"maskedItn": "good morning",
110+
"audioTimings": [
111+
{
112+
"word": "good",
113+
"offset": 11700000,
114+
"duration": 2100000
115+
},
116+
{
117+
"word": "morning",
118+
"offset": 13900000,
119+
"duration": 3100000
120+
}
121+
]
122+
},
123+
{
124+
"participantId": "agent_1",
125+
"id": "0d67d52b-693f-4e34-9881-754a14eec887",
126+
"text": "Can I have your name?",
127+
"lexical": "can i have your name",
128+
"itn": "can i have your name",
129+
"maskedItn": "can i have your name",
130+
"audioTimings": [
131+
{
132+
"word": "can",
133+
"offset": 44200000,
134+
"duration": 2200000
135+
},
136+
{
137+
"word": "i",
138+
"offset": 46500000,
139+
"duration": 800000
140+
},
141+
{
142+
"word": "have",
143+
"offset": 47400000,
144+
"duration": 1500000
145+
},
146+
{
147+
"word": "your",
148+
"offset": 49000000,
149+
"duration": 1500000
150+
},
151+
{
152+
"word": "name",
153+
"offset": 50600000,
154+
"duration": 2100000
155+
}
156+
]
157+
},
158+
{
159+
"participantId": "customer_1",
160+
"id": "08684a7a-5433-4658-a3f1-c6114fcfed51",
161+
"text": "Sure that is John Doe.",
162+
"lexical": "sure that is john doe",
163+
"itn": "sure that is john doe",
164+
"maskedItn": "sure that is john doe",
165+
"audioTimings": [
166+
{
167+
"word": "sure",
168+
"offset": 5400000,
169+
"duration": 6300000
170+
},
171+
{
172+
"word": "that",
173+
"offset": 13600000,
174+
"duration": 2300000
175+
},
176+
{
177+
"word": "is",
178+
"offset": 16000000,
179+
"duration": 1300000
180+
},
181+
{
182+
"word": "john",
183+
"offset": 17400000,
184+
"duration": 2500000
185+
},
186+
{
187+
"word": "doe",
188+
"offset": 20000000,
189+
"duration": 2700000
190+
}
191+
]
192+
}
193+
]
194+
}
195+
]
196+
},
197+
"tasks": [
198+
{
199+
"taskName": "analyze 1",
200+
"kind": "ConversationalPIITask",
201+
"parameters": {
202+
"modelVersion": "2022-05-15-preview",
203+
"redactionSource": "text",
204+
"includeAudioRedaction": true,
205+
"piiCategories": [
206+
"all"
207+
]
208+
}
209+
}
210+
]
211+
}
212+
`
213+
```
214+
215+
## Submit text chats
216+
217+
Use the following example if you have conversations that originated in text. For example, conversations through a text-based chat client.
218+
219+
```bash
220+
curl -i -X POST https://your-language-endpoint-here/language/analyze-conversations?api-version=2022-05-15-preview \
221+
-H "Content-Type: application/json" \
222+
-H "Ocp-Apim-Subscription-Key: your-key-here" \
223+
-d \
224+
'
225+
{
226+
"displayName": "Analyze conversations from xxx",
227+
"analysisInput": {
228+
"conversations": [
229+
{
230+
"id": "23611680-c4eb-4705-adef-4aa1c17507b5",
231+
"language": "en",
232+
"modality": "text",
233+
"conversationItems": [
234+
{
235+
"participantId": "agent_1",
236+
"id": "8074caf7-97e8-4492-ace3-d284821adacd",
237+
"text": "Good morning."
238+
},
239+
{
240+
"participantId": "agent_1",
241+
"id": "0d67d52b-693f-4e34-9881-754a14eec887",
242+
"text": "Can I have your name?"
243+
},
244+
{
245+
"participantId": "customer_1",
246+
"id": "08684a7a-5433-4658-a3f1-c6114fcfed51",
247+
"text": "Sure that is John Doe."
248+
}
249+
]
250+
}
251+
]
252+
},
253+
"tasks": [
254+
{
255+
"taskName": "analyze 1",
256+
"kind": "ConversationalPIITask",
257+
"parameters": {
258+
"modelVersion": "2022-05-15-preview"
259+
}
260+
}
261+
]
262+
}
263+
`
264+
```
265+
266+
267+
## Get the result
268+
269+
Get the `operation-location` from the response header. The value will look similar to the following URL:
270+
271+
```rest
272+
https://your-language-endpoint/language/analyze-conversations/jobs/12345678-1234-1234-1234-12345678
273+
```
274+
275+
To get the results of the request, use the following cURL command. Be sure to replace `my-job-id` with the numerical ID value you received from the previous `operation-location` response header:
276+
277+
```bash
278+
curl -X GET https://your-language-endpoint/language/analyze-conversations/jobs/my-job-id \
279+
-H "Content-Type: application/json" \
280+
-H "Ocp-Apim-Subscription-Key: your-key-here"
281+
```
282+
283+
---
284+
48285
## Service and data limits
49286

50287
[!INCLUDE [service limits article](../includes/service-limits-link.md)]

articles/cognitive-services/language-service/personally-identifiable-information/language-support.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ Use this article to learn which natural languages are supported by the PII featu
1919

2020
> [!NOTE]
2121
> * Languages are added as new [model versions](how-to-call.md#specify-the-pii-detection-model) are released.
22-
> * The current model version for PII is `2021-01-15`.
22+
23+
# [PII for documents](#tab/documents)
2324

2425
## PII language support
2526

@@ -36,6 +37,16 @@ Use this article to learn which natural languages are supported by the PII featu
3637
| Portuguese (Portugal) | `pt-PT` | 2021-01-15 | `pt` also accepted |
3738
| Spanish | `es` | 2020-04-01 | |
3839

40+
# [PII for conversations (preview)](#tab/conversations)
41+
42+
## PII language support
43+
44+
| Language | Language code | Starting with v3 model version: | Notes |
45+
|:----------------------|:-------------:|:-------------------------------:|:------------------:|
46+
| English | `en` | 2022-05-15-preview | |
47+
48+
---
49+
3950
## Next steps
4051

4152
[PII feature overview](overview.md)

articles/cognitive-services/language-service/personally-identifiable-information/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.custom: language-service-pii, ignite-fall-2021
1515

1616
# What is Personally Identifiable Information (PII) detection in Azure Cognitive Service for Language?
1717

18-
PII detection is one of the features offered by [Azure Cognitive Service for Language](../overview.md), a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. The PII detection feature can identify, categorize, and redact sensitive information in unstructured text. For example: phone numbers, email addresses, and forms of identification.
18+
PII detection is one of the features offered by [Azure Cognitive Service for Language](../overview.md), a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. The PII detection feature can identify, categorize, and redact sensitive information in unstructured text. For example: phone numbers, email addresses, and forms of identification. The method for utilizing PII in conversations is different than other use cases, and articles for this use have been separated.
1919

2020
* [**Quickstarts**](quickstart.md) are getting-started instructions to guide you through making requests to the service.
2121
* [**How-to guides**](how-to-call.md) contain instructions for using the service in more specific or customized ways.

articles/cognitive-services/language-service/personally-identifiable-information/quickstart.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ zone_pivot_groups: programming-languages-text-analytics
1919

2020
Use this article to get started detecting and redacting sensitive information in text, using the NER and PII client library and REST API. Follow these steps to try out examples code for mining text:
2121

22+
> [!NOTE]
23+
> This quickstart only covers PII detection in documents. To learn more about detecting PII in conversations, see [How to detect and redact PII in conversations](how-to-call-for-conversations.md).
24+
2225
::: zone pivot="programming-language-csharp"
2326

2427
[!INCLUDE [C# quickstart](includes/quickstarts/csharp-sdk.md)]

articles/cognitive-services/language-service/toc.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -612,10 +612,14 @@ items:
612612
items:
613613
- name: Call PII
614614
href: personally-identifiable-information/how-to-call.md
615+
- name: Call PII for Conversation (preview)
616+
href: personally-identifiable-information/how-to-call-for-conversations.md
615617
- name: Concepts
616618
items:
617619
- name: Recognized entity categories
618-
href: personally-identifiable-information/concepts/entity-categories.md
620+
href: personally-identifiable-information/concepts/entity-categories.md
621+
- name: Recognized entity categories for conversation
622+
href: personally-identifiable-information/concepts/conversations-entity-categories.md
619623
- name: Reference
620624
items:
621625
- name: REST API

0 commit comments

Comments
 (0)