-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[TA] Adding support for PII endpoint #13687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 9 commits
1666f39
64a8e75
e82b043
a6e0c5c
09b425a
6cf6e70
f8f5771
17092a5
ff4f2ad
cc11391
008d594
bda0f94
9b30ede
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ and includes six main functions: | |
- Language Detection | ||
- Key Phrase Extraction | ||
- Named Entity Recognition | ||
- Personally Identifiable Information Entity Recognition | ||
- Linked Entity Recognition | ||
|
||
[Source code][source_code] | [Package (Maven)][package] | [API reference documentation][api_reference_doc] | [Product Documentation][product_documentation] | [Samples][samples_readme] | ||
|
@@ -186,6 +187,7 @@ The following sections provide several code snippets covering some of the most c | |
* [Detect Language](#detect-language "Detect language") | ||
* [Extract Key Phrases](#extract-key-phrases "Extract key phrases") | ||
* [Recognize Entities](#recognize-entities "Recognize entities") | ||
* [Recognize Personally Identifiable Information Entities](#recognize-personally-identifiable-information-entities "Recognize Personally Identifiable Information entities") | ||
* [Recognize Linked Entities](#recognize-linked-entities "Recognize linked entities") | ||
|
||
### Text Analytics Client | ||
|
@@ -209,7 +211,7 @@ TextAnalyticsAsyncClient textAnalyticsClient = new TextAnalyticsClientBuilder() | |
|
||
### Analyze sentiment | ||
Run a Text Analytics predictive model to identify the positive, negative, neutral or mixed sentiment contained in the | ||
passed-in document or batch of documents. | ||
provided document or batch of documents. | ||
|
||
<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L104-L108 --> | ||
```java | ||
|
@@ -236,7 +238,7 @@ For samples on using the production recommended option `DetectLanguageBatch` see | |
Please refer to the service documentation for a conceptual discussion of [language detection][language_detection]. | ||
|
||
### Extract key phrases | ||
Run a model to identify a collection of significant phrases found in the passed-in document or batch of documents. | ||
Run a model to identify a collection of significant phrases found in the provided document or batch of documents. | ||
|
||
<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L149-L151 --> | ||
```java | ||
|
@@ -248,7 +250,7 @@ For samples on using the production recommended option `ExtractKeyPhrasesBatch` | |
Please refer to the service documentation for a conceptual discussion of [key phrase extraction][key_phrase_extraction]. | ||
|
||
### Recognize entities | ||
Run a predictive model to identify a collection of named entities in the passed-in document or batch of documents and | ||
Run a predictive model to identify a collection of named entities in the provided document or batch of documents and | ||
categorize those entities into categories such as person, location, or organization. For more information on available | ||
categories, see [Text Analytics Named Entity Categories][named_entities_categories]. | ||
|
||
|
@@ -262,8 +264,24 @@ textAnalyticsClient.recognizeEntities(document).forEach(entity -> | |
For samples on using the production recommended option `RecognizeEntitiesBatch` see [here][recognize_entities_sample]. | ||
Please refer to the service documentation for a conceptual discussion of [named entity recognition][named_entity_recognition]. | ||
|
||
### Recognize Personally Identifiable Information entities | ||
Run a predictive model to identify a collection of Personally Identifiable Information(PII) entities in the provided | ||
document. It recognizes and categorizes PII entities in its input text, such as | ||
Social Security Numbers, bank account information, credit card numbers, and more. This endpoint is only supported for | ||
API versions v3.1-preview.1 and above. | ||
|
||
<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L158-L161 --> | ||
```java | ||
String document = "My SSN is 859-98-0987"; | ||
textAnalyticsClient.recognizePiiEntities(document).forEach(entity -> System.out.printf( | ||
"Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s, offset: %s, length: %s, confidence score: %f.%n", | ||
mssfang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getOffset(), entity.getLength(), entity.getConfidenceScore())); | ||
``` | ||
For samples on using the production recommended option `RecognizePiiEntitiesBatch` see [here][recognize_pii_entities_sample]. | ||
Please refer to the service documentation for [supported PII entity types][pii_entity_recognition]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we do this section for all the readme snippets if not consider moving this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not getting what you mean here. It should already for all the README snippets. It follows the pattern. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggested if this could be moved to Next Steps section. |
||
|
||
### Recognize linked entities | ||
Run a predictive model to identify a collection of entities found in the passed-in document or batch of documents, | ||
Run a predictive model to identify a collection of entities found in the provided document or batch of documents, | ||
and include information linking the entities to their corresponding entries in a well-known knowledge base. | ||
|
||
<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L135-L142 --> | ||
|
@@ -357,6 +375,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For m | |
[named_entity_recognition]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking | ||
[named_entity_recognition_types]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal | ||
[named_entities_categories]: https://docs.microsoft.com/azure/cognitive-services/Text-Analytics/named-entity-types | ||
[pii_entity_recognition]: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal | ||
[package]: https://mvnrepository.com/artifact/com.azure/azure-ai-textanalytics | ||
[performance_tuning]: https://github.com/Azure/azure-sdk-for-java/wiki/Performance-Tuning | ||
[product_documentation]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview | ||
|
@@ -377,6 +396,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For m | |
[analyze_sentiment_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/AnalyzeSentimentBatchDocuments.java | ||
[extract_key_phrases_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/ExtractKeyPhrasesBatchDocuments.java | ||
[recognize_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizeEntitiesBatchDocuments.java | ||
[recognize_pii_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocuments.java | ||
[recognize_linked_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizeLinkedEntitiesBatchDocuments.java | ||
|
||
 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,204 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT License. | ||
|
||
package com.azure.ai.textanalytics; | ||
|
||
import com.azure.ai.textanalytics.implementation.TextAnalyticsClientImpl; | ||
import com.azure.ai.textanalytics.implementation.models.EntitiesResult; | ||
import com.azure.ai.textanalytics.implementation.models.MultiLanguageBatchInput; | ||
import com.azure.ai.textanalytics.implementation.models.WarningCodeValue; | ||
import com.azure.ai.textanalytics.models.EntityCategory; | ||
import com.azure.ai.textanalytics.models.PiiEntity; | ||
import com.azure.ai.textanalytics.models.PiiEntityCollection; | ||
import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult; | ||
import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; | ||
import com.azure.ai.textanalytics.models.TextAnalyticsWarning; | ||
import com.azure.ai.textanalytics.models.TextDocumentInput; | ||
import com.azure.ai.textanalytics.models.WarningCode; | ||
import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; | ||
import com.azure.core.http.rest.Response; | ||
import com.azure.core.http.rest.SimpleResponse; | ||
import com.azure.core.util.Context; | ||
import com.azure.core.util.IterableStream; | ||
import com.azure.core.util.logging.ClientLogger; | ||
import reactor.core.publisher.Mono; | ||
|
||
import java.util.ArrayList; | ||
import java.util.Collections; | ||
import java.util.List; | ||
import java.util.Objects; | ||
import java.util.stream.Collectors; | ||
|
||
import static com.azure.ai.textanalytics.TextAnalyticsAsyncClient.COGNITIVE_TRACING_NAMESPACE_VALUE; | ||
import static com.azure.ai.textanalytics.implementation.Utility.inputDocumentsValidation; | ||
import static com.azure.ai.textanalytics.implementation.Utility.mapToHttpResponseExceptionIfExist; | ||
import static com.azure.ai.textanalytics.implementation.Utility.toBatchStatistics; | ||
import static com.azure.ai.textanalytics.implementation.Utility.toMultiLanguageInput; | ||
import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsError; | ||
import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsException; | ||
import static com.azure.ai.textanalytics.implementation.Utility.toTextDocumentStatistics; | ||
import static com.azure.core.util.FluxUtil.monoError; | ||
import static com.azure.core.util.FluxUtil.withContext; | ||
import static com.azure.core.util.tracing.Tracer.AZ_TRACING_NAMESPACE_KEY; | ||
|
||
/** | ||
* Helper class for managing recognize Personally Identifiable Information entity endpoint. | ||
*/ | ||
class RecognizePiiEntityAsyncClient { | ||
private final ClientLogger logger = new ClientLogger(RecognizePiiEntityAsyncClient.class); | ||
private final TextAnalyticsClientImpl service; | ||
|
||
/** | ||
* Create a {@link RecognizePiiEntityAsyncClient} that sends requests to the Text Analytics services's | ||
* recognize Personally Identifiable Information entity endpoint. | ||
* | ||
* @param service The proxy service used to perform REST calls. | ||
*/ | ||
RecognizePiiEntityAsyncClient(TextAnalyticsClientImpl service) { | ||
samvaity marked this conversation as resolved.
Show resolved
Hide resolved
|
||
this.service = service; | ||
} | ||
|
||
/** | ||
* Helper function for calling service with max overloaded parameters that returns a {@link Mono} | ||
* which contains {@link PiiEntityCollection}. | ||
* | ||
* @param document A single document. | ||
* @param language The language code. | ||
* | ||
* @return The {@link Mono} of {@link PiiEntityCollection}. | ||
*/ | ||
Mono<PiiEntityCollection> recognizePiiEntities(String document, String language) { | ||
try { | ||
Objects.requireNonNull(document, "'document' cannot be null."); | ||
return recognizePiiEntitiesBatch( | ||
Collections.singletonList(new TextDocumentInput("0", document).setLanguage(language)), null) | ||
.map(resultCollectionResponse -> { | ||
PiiEntityCollection entityCollection = null; | ||
// for each loop will have only one entry inside | ||
for (RecognizePiiEntitiesResult entitiesResult : resultCollectionResponse.getValue()) { | ||
if (entitiesResult.isError()) { | ||
throw logger.logExceptionAsError(toTextAnalyticsException(entitiesResult.getError())); | ||
} | ||
entityCollection = new PiiEntityCollection(entitiesResult.getEntities(), | ||
entitiesResult.getEntities().getWarnings()); | ||
} | ||
return entityCollection; | ||
}); | ||
} catch (RuntimeException ex) { | ||
return monoError(logger, ex); | ||
} | ||
} | ||
|
||
/** | ||
* Helper function for calling service with max overloaded parameters. | ||
* | ||
* @param documents The list of documents to recognize Personally Identifiable Information entities for. | ||
* @param options The {@link TextAnalyticsRequestOptions} request options. | ||
* | ||
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}. | ||
*/ | ||
Mono<Response<RecognizePiiEntitiesResultCollection>> recognizePiiEntitiesBatch( | ||
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options) { | ||
try { | ||
inputDocumentsValidation(documents); | ||
return withContext(context -> getRecognizePiiEntitiesResponse(documents, options, context)); | ||
} catch (RuntimeException ex) { | ||
return monoError(logger, ex); | ||
} | ||
} | ||
|
||
/** | ||
* Helper function for calling service with max overloaded parameters with {@link Context} is given. | ||
* | ||
* @param documents The list of documents to recognize Personally Identifiable Information entities for. | ||
* @param options The {@link TextAnalyticsRequestOptions} request options. | ||
* @param context Additional context that is passed through the Http pipeline during the service call. | ||
* | ||
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}. | ||
*/ | ||
Mono<Response<RecognizePiiEntitiesResultCollection>> recognizePiiEntitiesBatchWithContext( | ||
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options, Context context) { | ||
try { | ||
inputDocumentsValidation(documents); | ||
return getRecognizePiiEntitiesResponse(documents, options, context); | ||
} catch (RuntimeException ex) { | ||
return monoError(logger, ex); | ||
} | ||
} | ||
|
||
/** | ||
* Helper method to convert the service response of {@link EntitiesResult} to {@link Response} which contains | ||
* {@link RecognizePiiEntitiesResultCollection}. | ||
* | ||
* @param response the {@link Response} of {@link EntitiesResult} returned by the service. | ||
* | ||
* @return A {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}. | ||
*/ | ||
private Response<RecognizePiiEntitiesResultCollection> toRecognizePiiEntitiesResultCollectionResponse( | ||
final Response<EntitiesResult> response) { | ||
final EntitiesResult entitiesResult = response.getValue(); | ||
// List of documents results | ||
final List<RecognizePiiEntitiesResult> recognizeEntitiesResults = new ArrayList<>(); | ||
entitiesResult.getDocuments().forEach(documentEntities -> { | ||
// Pii entities list | ||
final List<PiiEntity> piiEntities = documentEntities.getEntities().stream().map(entity -> | ||
new PiiEntity(entity.getText(), EntityCategory.fromString(entity.getCategory()), | ||
entity.getSubcategory(), entity.getConfidenceScore(), entity.getOffset(), entity.getLength())) | ||
.collect(Collectors.toList()); | ||
// Warnings | ||
final List<TextAnalyticsWarning> warnings = documentEntities.getWarnings().stream() | ||
.map(warning -> { | ||
final WarningCodeValue warningCodeValue = warning.getCode(); | ||
return new TextAnalyticsWarning( | ||
WarningCode.fromString(warningCodeValue == null ? null : warningCodeValue.toString()), | ||
warning.getMessage()); | ||
}).collect(Collectors.toList()); | ||
|
||
recognizeEntitiesResults.add(new RecognizePiiEntitiesResult( | ||
documentEntities.getId(), | ||
documentEntities.getStatistics() == null ? null | ||
: toTextDocumentStatistics(documentEntities.getStatistics()), | ||
null, | ||
new PiiEntityCollection(new IterableStream<>(piiEntities), new IterableStream<>(warnings)) | ||
)); | ||
}); | ||
// Document errors | ||
entitiesResult.getErrors().forEach(documentError -> { | ||
recognizeEntitiesResults.add( | ||
new RecognizePiiEntitiesResult(documentError.getId(), null, | ||
toTextAnalyticsError(documentError.getError()), null)); | ||
}); | ||
|
||
return new SimpleResponse<>(response, | ||
new RecognizePiiEntitiesResultCollection(recognizeEntitiesResults, entitiesResult.getModelVersion(), | ||
entitiesResult.getStatistics() == null ? null : toBatchStatistics(entitiesResult.getStatistics()))); | ||
} | ||
|
||
/** | ||
* Call the service with REST response, convert to a {@link Mono} of {@link Response} that contains | ||
* {@link RecognizePiiEntitiesResultCollection} from a {@link SimpleResponse} of {@link EntitiesResult}. | ||
* | ||
* @param documents The list of documents to recognize Personally Identifiable Information entities for. | ||
* @param options The {@link TextAnalyticsRequestOptions} request options. | ||
* @param context Additional context that is passed through the Http pipeline during the service call. | ||
* | ||
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}. | ||
*/ | ||
private Mono<Response<RecognizePiiEntitiesResultCollection>> getRecognizePiiEntitiesResponse( | ||
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options, Context context) { | ||
return service.entitiesRecognitionPiiWithResponseAsync( | ||
new MultiLanguageBatchInput().setDocuments(toMultiLanguageInput(documents)), | ||
options == null ? null : options.getModelVersion(), | ||
options == null ? null : options.isIncludeStatistics(), | ||
null, | ||
context.addData(AZ_TRACING_NAMESPACE_KEY, COGNITIVE_TRACING_NAMESPACE_VALUE)) | ||
.doOnSubscribe(ignoredValue -> logger.info( | ||
"Start recognizing Personally Identifiable Information entities for a batch of documents.")) | ||
.doOnSuccess(response -> logger.info( | ||
"Successfully recognized Personally Identifiable Information entities for a batch of documents.")) | ||
.doOnError(error -> | ||
logger.warning("Failed to recognize Personally Identifiable Information entities - {}", error)) | ||
.map(this::toRecognizePiiEntitiesResultCollectionResponse) | ||
.onErrorMap(throwable -> mapToHttpResponseExceptionIfExist(throwable)); | ||
} | ||
} |
Uh oh!
There was an error while loading. Please reload this page.