diff --git a/sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md b/sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md index d9a889710e9c..53218448ed0f 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md +++ b/sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md @@ -4,6 +4,10 @@ - `length` is the number of characters in the text of these models - `offset` is the offset of the text from the start of the document +**New features** +- Added support for Personally Identifiable Information(PII) entity recognition feature. + To use this feature, you need to make sure you are using the service's v3.1-preview.1 API. + ## 5.0.0 (2020-07-27) - Re-release of version `1.0.1` with updated version `5.0.0`. diff --git a/sdk/textanalytics/azure-ai-textanalytics/README.md b/sdk/textanalytics/azure-ai-textanalytics/README.md index 8e289b4c521e..0efe4d8ec9af 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/README.md +++ b/sdk/textanalytics/azure-ai-textanalytics/README.md @@ -6,6 +6,7 @@ and includes six main functions: - Language Detection - Key Phrase Extraction - Named Entity Recognition +- Personally Identifiable Information Entity Recognition - Linked Entity Recognition [Source code][source_code] | [Package (Maven)][package] | [API reference documentation][api_reference_doc] | [Product Documentation][product_documentation] | [Samples][samples_readme] @@ -186,6 +187,7 @@ The following sections provide several code snippets covering some of the most c * [Detect Language](#detect-language "Detect language") * [Extract Key Phrases](#extract-key-phrases "Extract key phrases") * [Recognize Entities](#recognize-entities "Recognize entities") +* [Recognize Personally Identifiable Information Entities](#recognize-personally-identifiable-information-entities "Recognize Personally Identifiable Information entities") * [Recognize Linked Entities](#recognize-linked-entities "Recognize linked entities") ### Text Analytics Client @@ -209,7 +211,7 @@ TextAnalyticsAsyncClient textAnalyticsClient = new TextAnalyticsClientBuilder() ### Analyze sentiment Run a Text Analytics predictive model to identify the positive, negative, neutral or mixed sentiment contained in the -passed-in document or batch of documents. +provided document or batch of documents. ```java @@ -236,7 +238,7 @@ For samples on using the production recommended option `DetectLanguageBatch` see Please refer to the service documentation for a conceptual discussion of [language detection][language_detection]. ### Extract key phrases -Run a model to identify a collection of significant phrases found in the passed-in document or batch of documents. +Run a model to identify a collection of significant phrases found in the provided document or batch of documents. ```java @@ -248,7 +250,7 @@ For samples on using the production recommended option `ExtractKeyPhrasesBatch` Please refer to the service documentation for a conceptual discussion of [key phrase extraction][key_phrase_extraction]. ### Recognize entities -Run a predictive model to identify a collection of named entities in the passed-in document or batch of documents and +Run a predictive model to identify a collection of named entities in the provided document or batch of documents and categorize those entities into categories such as person, location, or organization. For more information on available categories, see [Text Analytics Named Entity Categories][named_entities_categories]. @@ -262,8 +264,24 @@ textAnalyticsClient.recognizeEntities(document).forEach(entity -> For samples on using the production recommended option `RecognizeEntitiesBatch` see [here][recognize_entities_sample]. Please refer to the service documentation for a conceptual discussion of [named entity recognition][named_entity_recognition]. +### Recognize Personally Identifiable Information entities +Run a predictive model to identify a collection of Personally Identifiable Information(PII) entities in the provided +document. It recognizes and categorizes PII entities in its input text, such as +Social Security Numbers, bank account information, credit card numbers, and more. This endpoint is only supported for +API versions v3.1-preview.1 and above. + + +```java +String document = "My SSN is 859-98-0987"; +textAnalyticsClient.recognizePiiEntities(document).forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s," + + " confidence score: %f.%n", +``` + +Please refer to the service documentation for [supported PII entity types][pii_entity_recognition]. + ### Recognize linked entities -Run a predictive model to identify a collection of entities found in the passed-in document or batch of documents, +Run a predictive model to identify a collection of entities found in the provided document or batch of documents, and include information linking the entities to their corresponding entries in a well-known knowledge base. @@ -357,6 +375,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For m [named_entity_recognition]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking [named_entity_recognition_types]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal [named_entities_categories]: https://docs.microsoft.com/azure/cognitive-services/Text-Analytics/named-entity-types +[pii_entity_recognition]: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal [package]: https://mvnrepository.com/artifact/com.azure/azure-ai-textanalytics [performance_tuning]: https://github.com/Azure/azure-sdk-for-java/wiki/Performance-Tuning [product_documentation]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/RecognizeEntityAsyncClient.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/RecognizeEntityAsyncClient.java index dd7ac64f2250..4a99a7ce372b 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/RecognizeEntityAsyncClient.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/RecognizeEntityAsyncClient.java @@ -11,11 +11,11 @@ import com.azure.ai.textanalytics.models.CategorizedEntityCollection; import com.azure.ai.textanalytics.models.EntityCategory; import com.azure.ai.textanalytics.models.RecognizeEntitiesResult; -import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; import com.azure.ai.textanalytics.models.TextAnalyticsWarning; import com.azure.ai.textanalytics.models.TextDocumentInput; import com.azure.ai.textanalytics.models.WarningCode; +import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; import com.azure.core.exception.HttpResponseException; import com.azure.core.http.rest.Response; import com.azure.core.http.rest.SimpleResponse; diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/RecognizePiiEntityAsyncClient.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/RecognizePiiEntityAsyncClient.java new file mode 100644 index 000000000000..d2657b38cb29 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/RecognizePiiEntityAsyncClient.java @@ -0,0 +1,204 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics; + +import com.azure.ai.textanalytics.implementation.TextAnalyticsClientImpl; +import com.azure.ai.textanalytics.implementation.models.EntitiesResult; +import com.azure.ai.textanalytics.implementation.models.MultiLanguageBatchInput; +import com.azure.ai.textanalytics.implementation.models.WarningCodeValue; +import com.azure.ai.textanalytics.models.EntityCategory; +import com.azure.ai.textanalytics.models.PiiEntity; +import com.azure.ai.textanalytics.models.PiiEntityCollection; +import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult; +import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; +import com.azure.ai.textanalytics.models.TextAnalyticsWarning; +import com.azure.ai.textanalytics.models.TextDocumentInput; +import com.azure.ai.textanalytics.models.WarningCode; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; +import com.azure.core.http.rest.Response; +import com.azure.core.http.rest.SimpleResponse; +import com.azure.core.util.Context; +import com.azure.core.util.IterableStream; +import com.azure.core.util.logging.ClientLogger; +import reactor.core.publisher.Mono; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Objects; +import java.util.stream.Collectors; + +import static com.azure.ai.textanalytics.TextAnalyticsAsyncClient.COGNITIVE_TRACING_NAMESPACE_VALUE; +import static com.azure.ai.textanalytics.implementation.Utility.inputDocumentsValidation; +import static com.azure.ai.textanalytics.implementation.Utility.mapToHttpResponseExceptionIfExist; +import static com.azure.ai.textanalytics.implementation.Utility.toBatchStatistics; +import static com.azure.ai.textanalytics.implementation.Utility.toMultiLanguageInput; +import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsError; +import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsException; +import static com.azure.ai.textanalytics.implementation.Utility.toTextDocumentStatistics; +import static com.azure.core.util.FluxUtil.monoError; +import static com.azure.core.util.FluxUtil.withContext; +import static com.azure.core.util.tracing.Tracer.AZ_TRACING_NAMESPACE_KEY; + +/** + * Helper class for managing recognize Personally Identifiable Information entity endpoint. + */ +class RecognizePiiEntityAsyncClient { + private final ClientLogger logger = new ClientLogger(RecognizePiiEntityAsyncClient.class); + private final TextAnalyticsClientImpl service; + + /** + * Create a {@link RecognizePiiEntityAsyncClient} that sends requests to the Text Analytics services's + * recognize Personally Identifiable Information entity endpoint. + * + * @param service The proxy service used to perform REST calls. + */ + RecognizePiiEntityAsyncClient(TextAnalyticsClientImpl service) { + this.service = service; + } + + /** + * Helper function for calling service with max overloaded parameters that returns a {@link Mono} + * which contains {@link PiiEntityCollection}. + * + * @param document A single document. + * @param language The language code. + * + * @return The {@link Mono} of {@link PiiEntityCollection}. + */ + Mono recognizePiiEntities(String document, String language) { + try { + Objects.requireNonNull(document, "'document' cannot be null."); + return recognizePiiEntitiesBatch( + Collections.singletonList(new TextDocumentInput("0", document).setLanguage(language)), null) + .map(resultCollectionResponse -> { + PiiEntityCollection entityCollection = null; + // for each loop will have only one entry inside + for (RecognizePiiEntitiesResult entitiesResult : resultCollectionResponse.getValue()) { + if (entitiesResult.isError()) { + throw logger.logExceptionAsError(toTextAnalyticsException(entitiesResult.getError())); + } + entityCollection = new PiiEntityCollection(entitiesResult.getEntities(), + entitiesResult.getEntities().getWarnings()); + } + return entityCollection; + }); + } catch (RuntimeException ex) { + return monoError(logger, ex); + } + } + + /** + * Helper function for calling service with max overloaded parameters. + * + * @param documents The list of documents to recognize Personally Identifiable Information entities for. + * @param options The {@link TextAnalyticsRequestOptions} request options. + * + * @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}. + */ + Mono> recognizePiiEntitiesBatch( + Iterable documents, TextAnalyticsRequestOptions options) { + try { + inputDocumentsValidation(documents); + return withContext(context -> getRecognizePiiEntitiesResponse(documents, options, context)); + } catch (RuntimeException ex) { + return monoError(logger, ex); + } + } + + /** + * Helper function for calling service with max overloaded parameters with {@link Context} is given. + * + * @param documents The list of documents to recognize Personally Identifiable Information entities for. + * @param options The {@link TextAnalyticsRequestOptions} request options. + * @param context Additional context that is passed through the Http pipeline during the service call. + * + * @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}. + */ + Mono> recognizePiiEntitiesBatchWithContext( + Iterable documents, TextAnalyticsRequestOptions options, Context context) { + try { + inputDocumentsValidation(documents); + return getRecognizePiiEntitiesResponse(documents, options, context); + } catch (RuntimeException ex) { + return monoError(logger, ex); + } + } + + /** + * Helper method to convert the service response of {@link EntitiesResult} to {@link Response} which contains + * {@link RecognizePiiEntitiesResultCollection}. + * + * @param response the {@link Response} of {@link EntitiesResult} returned by the service. + * + * @return A {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}. + */ + private Response toRecognizePiiEntitiesResultCollectionResponse( + final Response response) { + final EntitiesResult entitiesResult = response.getValue(); + // List of documents results + final List recognizeEntitiesResults = new ArrayList<>(); + entitiesResult.getDocuments().forEach(documentEntities -> { + // Pii entities list + final List piiEntities = documentEntities.getEntities().stream().map(entity -> + new PiiEntity(entity.getText(), EntityCategory.fromString(entity.getCategory()), + entity.getSubcategory(), entity.getConfidenceScore(), entity.getOffset(), entity.getLength())) + .collect(Collectors.toList()); + // Warnings + final List warnings = documentEntities.getWarnings().stream() + .map(warning -> { + final WarningCodeValue warningCodeValue = warning.getCode(); + return new TextAnalyticsWarning( + WarningCode.fromString(warningCodeValue == null ? null : warningCodeValue.toString()), + warning.getMessage()); + }).collect(Collectors.toList()); + + recognizeEntitiesResults.add(new RecognizePiiEntitiesResult( + documentEntities.getId(), + documentEntities.getStatistics() == null ? null + : toTextDocumentStatistics(documentEntities.getStatistics()), + null, + new PiiEntityCollection(new IterableStream<>(piiEntities), new IterableStream<>(warnings)) + )); + }); + // Document errors + entitiesResult.getErrors().forEach(documentError -> { + recognizeEntitiesResults.add( + new RecognizePiiEntitiesResult(documentError.getId(), null, + toTextAnalyticsError(documentError.getError()), null)); + }); + + return new SimpleResponse<>(response, + new RecognizePiiEntitiesResultCollection(recognizeEntitiesResults, entitiesResult.getModelVersion(), + entitiesResult.getStatistics() == null ? null : toBatchStatistics(entitiesResult.getStatistics()))); + } + + /** + * Call the service with REST response, convert to a {@link Mono} of {@link Response} that contains + * {@link RecognizePiiEntitiesResultCollection} from a {@link SimpleResponse} of {@link EntitiesResult}. + * + * @param documents The list of documents to recognize Personally Identifiable Information entities for. + * @param options The {@link TextAnalyticsRequestOptions} request options. + * @param context Additional context that is passed through the Http pipeline during the service call. + * + * @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}. + */ + private Mono> getRecognizePiiEntitiesResponse( + Iterable documents, TextAnalyticsRequestOptions options, Context context) { + return service.entitiesRecognitionPiiWithResponseAsync( + new MultiLanguageBatchInput().setDocuments(toMultiLanguageInput(documents)), + options == null ? null : options.getModelVersion(), + options == null ? null : options.isIncludeStatistics(), + null, + context.addData(AZ_TRACING_NAMESPACE_KEY, COGNITIVE_TRACING_NAMESPACE_VALUE)) + .doOnSubscribe(ignoredValue -> logger.info( + "Start recognizing Personally Identifiable Information entities for a batch of documents.")) + .doOnSuccess(response -> logger.info( + "Successfully recognized Personally Identifiable Information entities for a batch of documents.")) + .doOnError(error -> + logger.warning("Failed to recognize Personally Identifiable Information entities - {}", error)) + .map(this::toRecognizePiiEntitiesResultCollectionResponse) + .onErrorMap(throwable -> mapToHttpResponseExceptionIfExist(throwable)); + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClient.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClient.java index 82d0b23e456a..4b409506aeb2 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClient.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClient.java @@ -12,6 +12,7 @@ import com.azure.ai.textanalytics.models.DocumentSentiment; import com.azure.ai.textanalytics.models.KeyPhrasesCollection; import com.azure.ai.textanalytics.models.LinkedEntityCollection; +import com.azure.ai.textanalytics.models.PiiEntityCollection; import com.azure.ai.textanalytics.models.TextAnalyticsError; import com.azure.ai.textanalytics.models.TextAnalyticsException; import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; @@ -21,6 +22,7 @@ import com.azure.ai.textanalytics.util.ExtractKeyPhrasesResultCollection; import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; import com.azure.ai.textanalytics.util.RecognizeLinkedEntitiesResultCollection; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; import com.azure.core.annotation.ReturnType; import com.azure.core.annotation.ServiceClient; import com.azure.core.annotation.ServiceMethod; @@ -32,6 +34,7 @@ import java.util.Collections; import java.util.Objects; +import static com.azure.ai.textanalytics.implementation.Utility.inputDocumentsValidation; import static com.azure.ai.textanalytics.implementation.Utility.mapByIndex; import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsException; import static com.azure.core.util.FluxUtil.monoError; @@ -63,6 +66,7 @@ public final class TextAnalyticsAsyncClient { final AnalyzeSentimentAsyncClient analyzeSentimentAsyncClient; final ExtractKeyPhraseAsyncClient extractKeyPhraseAsyncClient; final RecognizeEntityAsyncClient recognizeEntityAsyncClient; + final RecognizePiiEntityAsyncClient recognizePiiEntityAsyncClient; final RecognizeLinkedEntityAsyncClient recognizeLinkedEntityAsyncClient; /** @@ -84,6 +88,7 @@ public final class TextAnalyticsAsyncClient { this.analyzeSentimentAsyncClient = new AnalyzeSentimentAsyncClient(service); this.extractKeyPhraseAsyncClient = new ExtractKeyPhraseAsyncClient(service); this.recognizeEntityAsyncClient = new RecognizeEntityAsyncClient(service); + this.recognizePiiEntityAsyncClient = new RecognizePiiEntityAsyncClient(service); this.recognizeLinkedEntityAsyncClient = new RecognizeLinkedEntityAsyncClient(service); } @@ -249,7 +254,8 @@ public Mono> detectLanguageBatchWithRes * * For a list of supported entity types, check: this. * For a list of enabled languages, check: this. - * This method will use the default language that sets up in + * + * This method will use the default language that can be set by using method * {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is specified, service will use 'en' as * the language. * @@ -364,13 +370,138 @@ public Mono> recognizeEntitiesBatchW return recognizeEntityAsyncClient.recognizeEntitiesBatch(documents, options); } - // Linked Entity + // PII Entity + + /** + * Returns a list of Personally Identifiable Information(PII) entities in the provided document. + * + * For a list of supported entity types, check: this. + * For a list of enabled languages, check: this. This method will use the + * default language that is set using {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is + * specified, service will use 'en' as the language. + * + *

Code sample

+ *

Recognize the PII entities details in a document. + * Subscribes to the call asynchronously and prints out the recognized entity details when a response is + * received.

+ * + * {@codesnippet com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntities#string} + * + * @param document The document to recognize PII entities details for. + * For text length limits, maximum batch size, and supported text encoding, see + * data limits. + * + * @return A {@link Mono} contains a {@link PiiEntityCollection recognized PII entities collection}. + * + * @throws NullPointerException if {@code document} is null. + * @throws TextAnalyticsException if the response returned with an {@link TextAnalyticsError error}. + */ + @ServiceMethod(returns = ReturnType.SINGLE) + public Mono recognizePiiEntities(String document) { + return recognizePiiEntities(document, defaultLanguage); + } + + /** + * Returns a list of Personally Identifiable Information(PII) entities in the provided document + * with provided language code. + * + * For a list of supported entity types, check: this. + * For a list of enabled languages, check: this. + * + *

Code sample

+ *

Recognize the PII entities details in a document with provided language code. + * Subscribes to the call asynchronously and prints out the entity details when a response is received.

+ * + * {@codesnippet com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntities#string-string} + * + * @param document the text to recognize PII entities details for. + * For text length limits, maximum batch size, and supported text encoding, see + * data limits. + * @param language The 2 letter ISO 639-1 representation of language. If not set, uses "en" for English as default. + * + * @return A {@link Mono} contains a {@link PiiEntityCollection recognized PII entities collection}. + * + * @throws NullPointerException if {@code document} is null. + * @throws TextAnalyticsException if the response returned with an {@link TextAnalyticsError error}. + */ + @ServiceMethod(returns = ReturnType.SINGLE) + public Mono recognizePiiEntities(String document, String language) { + return recognizePiiEntityAsyncClient.recognizePiiEntities(document, language); + } + + /** + * Returns a list of Personally Identifiable Information(PII) entities for the provided list of documents with + * the provided language code and request options. + * + *

Code sample

+ *

Recognize Personally Identifiable Information entities in a document with the provided language code. + * Subscribes to the call asynchronously and prints out the entity details when a response is received.

+ * + * {@codesnippet com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntitiesBatch#Iterable-String-TextAnalyticsRequestOptions} + * + * @param documents A list of documents to recognize PII entities for. + * For text length limits, maximum batch size, and supported text encoding, see + * data limits. + * @param language The 2 letter ISO 639-1 representation of language. If not set, uses "en" for English as default. + * @param options The {@link TextAnalyticsRequestOptions options} to configure the scoring model for documents + * and show statistics. + * + * @return A {@link Mono} contains a {@link RecognizePiiEntitiesResultCollection}. + * + * @throws NullPointerException if {@code documents} is null. + * @throws IllegalArgumentException if {@code documents} is empty. + */ + @ServiceMethod(returns = ReturnType.SINGLE) + public Mono recognizePiiEntitiesBatch( + Iterable documents, String language, TextAnalyticsRequestOptions options) { + try { + inputDocumentsValidation(documents); + return recognizePiiEntitiesBatchWithResponse( + mapByIndex(documents, (index, value) -> { + final TextDocumentInput textDocumentInput = new TextDocumentInput(index, value); + textDocumentInput.setLanguage(language); + return textDocumentInput; + }), options).flatMap(FluxUtil::toMono); + } catch (RuntimeException ex) { + return monoError(logger, ex); + } + } + + /** + * Returns a list of Personally Identifiable Information entities for the provided list of + * {@link TextDocumentInput document} with provided request options. + * + *

Code sample

+ *

Recognize the PII entities details with http response in a list of {@link TextDocumentInput document} + * with provided request options. + * Subscribes to the call asynchronously and prints out the entity details when a response is received.

+ * + * {@codesnippet com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntitiesBatch#Iterable-TextAnalyticsRequestOptions} + * + * @param documents A list of {@link TextDocumentInput documents} to recognize PII entities for. + * For text length limits, maximum batch size, and supported text encoding, see + * data limits. + * @param options The {@link TextAnalyticsRequestOptions options} to configure the scoring model for documents + * and show statistics. + * + * @return A {@link Mono} contains a {@link Response} which contains a {@link RecognizePiiEntitiesResultCollection}. + * + * @throws NullPointerException if {@code documents} is null. + * @throws IllegalArgumentException if {@code documents} is empty. + */ + @ServiceMethod(returns = ReturnType.SINGLE) + public Mono> recognizePiiEntitiesBatchWithResponse( + Iterable documents, TextAnalyticsRequestOptions options) { + return recognizePiiEntityAsyncClient.recognizePiiEntitiesBatch(documents, options); + } + + // Linked Entities /** * Returns a list of recognized entities with links to a well-known knowledge base for the provided document. See * this for supported languages in Text Analytics API. * - * This method will use the default language that sets up in + * This method will use the default language that can be set by using method * {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is specified, service will use 'en' as * the language. * @@ -490,7 +621,7 @@ public Mono recognizeLinkedEntitiesBatc /** * Returns a list of strings denoting the key phrases in the document. * - * This method will use the default language that sets up in + * This method will use the default language that can be set by using method * {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is specified, service will use 'en' as * the language. * @@ -609,7 +740,7 @@ public Mono> extractKeyPhrasesBatchW * Returns a sentiment prediction, as well as confidence scores for each sentiment label (Positive, Negative, and * Neutral) for the document and each sentence within it. * - * This method will use the default language that sets up in + * This method will use the default language that can be set by using method * {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is specified, service will use 'en' as * the language. * diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/TextAnalyticsClient.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/TextAnalyticsClient.java index 71e2427f70c0..01508a1e2a3f 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/TextAnalyticsClient.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/TextAnalyticsClient.java @@ -3,23 +3,25 @@ package com.azure.ai.textanalytics; -import com.azure.ai.textanalytics.util.AnalyzeSentimentResultCollection; import com.azure.ai.textanalytics.models.CategorizedEntity; import com.azure.ai.textanalytics.models.CategorizedEntityCollection; import com.azure.ai.textanalytics.models.DetectLanguageInput; -import com.azure.ai.textanalytics.util.DetectLanguageResultCollection; import com.azure.ai.textanalytics.models.DetectedLanguage; import com.azure.ai.textanalytics.models.DocumentSentiment; -import com.azure.ai.textanalytics.util.ExtractKeyPhrasesResultCollection; import com.azure.ai.textanalytics.models.KeyPhrasesCollection; import com.azure.ai.textanalytics.models.LinkedEntity; import com.azure.ai.textanalytics.models.LinkedEntityCollection; -import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; -import com.azure.ai.textanalytics.util.RecognizeLinkedEntitiesResultCollection; +import com.azure.ai.textanalytics.models.PiiEntityCollection; import com.azure.ai.textanalytics.models.TextAnalyticsError; import com.azure.ai.textanalytics.models.TextAnalyticsException; import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; import com.azure.ai.textanalytics.models.TextDocumentInput; +import com.azure.ai.textanalytics.util.AnalyzeSentimentResultCollection; +import com.azure.ai.textanalytics.util.DetectLanguageResultCollection; +import com.azure.ai.textanalytics.util.ExtractKeyPhrasesResultCollection; +import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; +import com.azure.ai.textanalytics.util.RecognizeLinkedEntitiesResultCollection; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; import com.azure.core.annotation.ReturnType; import com.azure.core.annotation.ServiceClient; import com.azure.core.annotation.ServiceMethod; @@ -184,7 +186,7 @@ public Response detectLanguageBatchWithResponse( * * For a list of supported entity types, check: this * - * This method will use the default language that sets up in + * This method will use the default language that can be set by using method * {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is specified, service will use 'en' as * the language. * @@ -192,7 +194,7 @@ public Response detectLanguageBatchWithResponse( *

Recognize the entities of documents

* {@codesnippet com.azure.ai.textanalytics.TextAnalyticsClient.recognizeCategorizedEntities#String} * - * @param document the document to recognize entities for. + * @param document The document to recognize entities for. * For text length limits, maximum batch size, and supported text encoding, see * data limits. * @@ -289,12 +291,125 @@ public Response recognizeEntitiesBatchWithRes return client.recognizeEntityAsyncClient.recognizeEntitiesBatchWithContext(documents, options, context).block(); } + // PII Entity + /** + * Returns a list of Personally Identifiable Information(PII) entities in the provided document. + * + * For a list of supported entity types, check: this + * For a list of enabled languages, check: this. This method will use the + * default language that is set using {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is + * specified, service will use 'en' as the language. + * + *

Code Sample

+ *

Recognize the PII entities details in a document.

+ * + * {@codesnippet com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntities#String} + * + * @param document The document to recognize PII entities details for. + * For text length limits, maximum batch size, and supported text encoding, see + * data limits. + * + * @return A {@link PiiEntityCollection recognized PII entities collection}. + * + * @throws NullPointerException if {@code document} is null. + * @throws TextAnalyticsException if the response returned with an {@link TextAnalyticsError error}. + */ + @ServiceMethod(returns = ReturnType.SINGLE) + public PiiEntityCollection recognizePiiEntities(String document) { + return recognizePiiEntities(document, client.getDefaultLanguage()); + } + + /** + * Returns a list of Personally Identifiable Information(PII) entities in the provided document + * with provided language code. + * + * For a list of supported entity types, check: this + * For a list of enabled languages, check: this + * + *

Code Sample

+ *

Recognizes the PII entities details in a document with a provided language code.

+ * + * {@codesnippet com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntities#String-String} + * + * @param document The document to recognize PII entities details for. + * For text length limits, maximum batch size, and supported text encoding, see + * data limits. + * @param language The 2 letter ISO 639-1 representation of language. If not set, uses "en" for English as default. + * + * @return The {@link PiiEntityCollection recognized PII entities collection}. + * + * @throws NullPointerException if {@code document} is null. + * @throws TextAnalyticsException if the response returned with an {@link TextAnalyticsError error}. + */ + @ServiceMethod(returns = ReturnType.SINGLE) + public PiiEntityCollection recognizePiiEntities(String document, String language) { + Objects.requireNonNull(document, "'document' cannot be null."); + return client.recognizePiiEntities(document, language).block(); + } + + /** + * Returns a list of Personally Identifiable Information(PII) entities for the provided list of documents with + * provided language code and request options. + * + *

Code Sample

+ *

Recognizes the PII entities details in a list of documents with a provided language code + * and request options.

+ * + * {@codesnippet com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntitiesBatch#Iterable-String-TextAnalyticsRequestOptions} + * + * @param documents A list of documents to recognize PII entities for. + * For text length limits, maximum batch size, and supported text encoding, see + * data limits. + * @param language The 2 letter ISO 639-1 representation of language. If not set, uses "en" for English as default. + * @param options The {@link TextAnalyticsRequestOptions options} to configure the scoring model for documents + * and show statistics. + * + * @return A {@link RecognizePiiEntitiesResultCollection}. + * + * @throws NullPointerException if {@code documents} is null. + * @throws IllegalArgumentException if {@code documents} is empty. + */ + @ServiceMethod(returns = ReturnType.SINGLE) + public RecognizePiiEntitiesResultCollection recognizePiiEntitiesBatch( + Iterable documents, String language, TextAnalyticsRequestOptions options) { + return client.recognizePiiEntitiesBatch(documents, language, options).block(); + } + + /** + * Returns a list of Personally Identifiable Information(PII) entities for the provided list of + * {@link TextDocumentInput document} with provided request options. + * + *

Code Sample

+ *

Recognizes the PII entities details with http response in a list of {@link TextDocumentInput document} + * with provided request options.

+ * + * {@codesnippet com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntitiesBatch#Iterable-TextAnalyticsRequestOptions-Context} + * + * @param documents A list of {@link TextDocumentInput documents} to recognize PII entities for. + * For text length limits, maximum batch size, and supported text encoding, see + * data limits. + * @param options The {@link TextAnalyticsRequestOptions options} to configure the scoring model for documents + * and show statistics. + * @param context Additional context that is passed through the Http pipeline during the service call. + * + * @return A {@link Response} that contains a {@link RecognizePiiEntitiesResultCollection}. + * + * @throws NullPointerException if {@code documents} is null. + * @throws IllegalArgumentException if {@code documents} is empty. + */ + @ServiceMethod(returns = ReturnType.SINGLE) + public Response recognizePiiEntitiesBatchWithResponse( + Iterable documents, TextAnalyticsRequestOptions options, Context context) { + return client.recognizePiiEntityAsyncClient.recognizePiiEntitiesBatchWithContext(documents, options, + context).block(); + } + // Linked Entities /** * Returns a list of recognized entities with links to a well-known knowledge base for the provided document. * See this for supported languages in Text Analytics API. * - * This method will use the default language that sets up in + * This method will use the default language that can be set by using method * {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is specified, service will use 'en' as * the language. * @@ -302,7 +417,7 @@ public Response recognizeEntitiesBatchWithRes *

Recognize the linked entities of documents

* {@codesnippet com.azure.ai.textanalytics.TextAnalyticsClient.recognizeLinkedEntities#String} * - * @param document the document to recognize linked entities for. + * @param document The document to recognize linked entities for. * For text length limits, maximum batch size, and supported text encoding, see * data limits. * @@ -410,7 +525,7 @@ public RecognizeLinkedEntitiesResultCollection recognizeLinkedEntitiesBatch( /** * Returns a list of strings denoting the key phrases in the document. * - * This method will use the default language that sets up in + * This method will use the default language that can be set by using method * {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is specified, service will use 'en' as * the language. * @@ -522,7 +637,7 @@ public Response extractKeyPhrasesBatchWithRes * Returns a sentiment prediction, as well as confidence scores for each sentiment label * (Positive, Negative, and Neutral) for the document and each sentence within i * - * This method will use the default language that sets up in + * This method will use the default language that can be set by using method * {@link TextAnalyticsClientBuilder#defaultLanguage(String)}. If none is specified, service will use 'en' as * the language. * diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/AnalyzeSentimentResult.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/AnalyzeSentimentResult.java index 82fe9ac988d0..0c89a8535903 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/AnalyzeSentimentResult.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/AnalyzeSentimentResult.java @@ -30,6 +30,9 @@ public AnalyzeSentimentResult(String id, TextDocumentStatistics textDocumentStat * Get the document sentiment. * * @return The document sentiment. + * + * @throws TextAnalyticsException if result has {@code isError} equals to true and when a non-error property + * was accessed. */ public DocumentSentiment getDocumentSentiment() { throwExceptionIfError(); diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/DetectLanguageResult.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/DetectLanguageResult.java index 1636b1093230..49ab2a4a0fee 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/DetectLanguageResult.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/DetectLanguageResult.java @@ -29,6 +29,9 @@ public DetectLanguageResult(String id, TextDocumentStatistics textDocumentStatis * Get the detected primary language. * * @return The detected language. + * + * @throws TextAnalyticsException if result has {@code isError} equals to true and when a non-error property + * was accessed. */ public DetectedLanguage getPrimaryLanguage() { throwExceptionIfError(); diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/ExtractKeyPhraseResult.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/ExtractKeyPhraseResult.java index 7b472d40764d..e2aa9fdf5114 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/ExtractKeyPhraseResult.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/ExtractKeyPhraseResult.java @@ -30,6 +30,9 @@ public ExtractKeyPhraseResult(String id, TextDocumentStatistics textDocumentStat * Get a {@link KeyPhrasesCollection} contains a list of key phrases and warnings. * * @return A {@link KeyPhrasesCollection} contains a list of key phrases and warnings. + * + * @throws TextAnalyticsException if result has {@code isError} equals to true and when a non-error property + * was accessed. */ public KeyPhrasesCollection getKeyPhrases() { throwExceptionIfError(); diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/PiiEntity.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/PiiEntity.java new file mode 100644 index 000000000000..626f79204590 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/PiiEntity.java @@ -0,0 +1,116 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics.models; + +import com.azure.core.annotation.Immutable; + +/** + * The {@link PiiEntity} model. + */ +@Immutable +public final class PiiEntity { + /* + * PiiEntity text as appears in the request. + */ + private final String text; + + /* + * PiiEntity category, such as Person/Location/Org/SSN etc + */ + private final EntityCategory category; + + /* + * PiiEntity sub category, such as Medical/Stock exchange/Sports etc + */ + private final String subcategory; + + /* + * Confidence score between 0 and 1 of the extracted entity. + */ + private final double confidenceScore; + + /* + * Start position for the entity text. + */ + private final int offset; + + /* + * The length for the entity text. + */ + private final int length; + + /** + * Creates a {@link PiiEntity} model that describes entity. + * + * @param text The entity text as appears in the request. + * @param category The entity category, such as Person/Location/Org/SSN etc. + * @param subcategory The entity subcategory, such as Medical/Stock exchange/Sports etc. + * @param confidenceScore A confidence score between 0 and 1 of the recognized entity. + * @param offset The start position for the entity text + * @param length The length for the entity text + */ + public PiiEntity(String text, EntityCategory category, String subcategory, double confidenceScore, int offset, + int length) { + this.text = text; + this.category = category; + this.subcategory = subcategory; + this.offset = offset; + this.length = length; + this.confidenceScore = confidenceScore; + } + + /** + * Get the text property: PII entity text as appears in the request. + * + * @return The {@code text} value. + */ + public String getText() { + return this.text; + } + + /** + * Get the category property: Categorized entity category, such as Person/Location/Org/SSN etc. + * + * @return The {@code category} value. + */ + public EntityCategory getCategory() { + return this.category; + } + + /** + * Get the subcategory property: Categorized entity subcategory, such as Medical/Stock exchange/Sports etc. + * + * @return The {@code subcategory} value. + */ + public String getSubcategory() { + return this.subcategory; + } + + /** + * Get the score property: Confidence score between 0 and 1 of the recognized entity. + * + * @return The {@code confidenceScore} value. + */ + public double getConfidenceScore() { + return this.confidenceScore; + } + + /** + * Get the offset property: the start position for the entity text. + * + * @return The {@code offset} value. + */ + public int getOffset() { + return this.offset; + } + + /** + * Get the length property: the length for the entity text. + * + * @return The {@code length} value. + */ + public int getLength() { + return this.length; + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/PiiEntityCollection.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/PiiEntityCollection.java new file mode 100644 index 000000000000..cc9807ab0404 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/PiiEntityCollection.java @@ -0,0 +1,36 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics.models; + +import com.azure.core.annotation.Immutable; +import com.azure.core.util.IterableStream; + +/** + * The {@link PiiEntityCollection} model. + */ +@Immutable +public final class PiiEntityCollection extends IterableStream { + + private final IterableStream warnings; + + /** + * Creates a {@link PiiEntityCollection} model that describes a entities collection including warnings. + * + * @param entities An {@link IterableStream} of {@link PiiEntity Personally Identifiable Information entities}. + * @param warnings An {@link IterableStream} of {@link TextAnalyticsWarning warnings}. + */ + public PiiEntityCollection(IterableStream entities, IterableStream warnings) { + super(entities); + this.warnings = warnings; + } + + /** + * Get the {@link IterableStream} of {@link TextAnalyticsWarning Text Analytics warnings}. + * + * @return {@link IterableStream} of {@link TextAnalyticsWarning}. + */ + public IterableStream getWarnings() { + return this.warnings; + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizeEntitiesResult.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizeEntitiesResult.java index 12e0c94a1318..ac360e858123 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizeEntitiesResult.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizeEntitiesResult.java @@ -31,6 +31,9 @@ public RecognizeEntitiesResult(String id, TextDocumentStatistics textDocumentSta * Get an {@link IterableStream} of {@link CategorizedEntity}. * * @return An {@link IterableStream} of {@link CategorizedEntity}. + * + * @throws TextAnalyticsException if result has {@code isError} equals to true and when a non-error property + * was accessed. */ public CategorizedEntityCollection getEntities() { throwExceptionIfError(); diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizeLinkedEntitiesResult.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizeLinkedEntitiesResult.java index a6721ef5f018..f8b373dd10a7 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizeLinkedEntitiesResult.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizeLinkedEntitiesResult.java @@ -31,6 +31,9 @@ public RecognizeLinkedEntitiesResult(String id, TextDocumentStatistics textDocum * Get an {@link IterableStream} of {@link LinkedEntity}. * * @return An {@link IterableStream} of {@link LinkedEntity}. + * + * @throws TextAnalyticsException if result has {@code isError} equals to true and when a non-error property + * was accessed. */ public LinkedEntityCollection getEntities() { throwExceptionIfError(); diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizePiiEntitiesResult.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizePiiEntitiesResult.java new file mode 100644 index 000000000000..a27d37d5b7ac --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/RecognizePiiEntitiesResult.java @@ -0,0 +1,42 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics.models; + +import com.azure.core.annotation.Immutable; +import com.azure.core.util.IterableStream; + +/** + * The {@link RecognizePiiEntitiesResult} model. + */ +@Immutable +public final class RecognizePiiEntitiesResult extends TextAnalyticsResult { + private final PiiEntityCollection entities; + + /** + * Creates a {@link RecognizePiiEntitiesResult} model that describes recognized PII entities result. + * + * @param id Unique, non-empty document identifier. + * @param textDocumentStatistics The text document statistics. + * @param error The document error. + * @param entities A {@link PiiEntityCollection} contains entities and warnings. + */ + public RecognizePiiEntitiesResult(String id, TextDocumentStatistics textDocumentStatistics, + TextAnalyticsError error, PiiEntityCollection entities) { + super(id, textDocumentStatistics, error); + this.entities = entities; + } + + /** + * Get an {@link IterableStream} of {@link PiiEntity}. + * + * @return An {@link IterableStream} of {@link PiiEntity}. + * + * @throws TextAnalyticsException if result has {@code isError} equals to true and when a non-error property + * was accessed. + */ + public PiiEntityCollection getEntities() { + throwExceptionIfError(); + return entities; + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/TextAnalyticsResult.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/TextAnalyticsResult.java index d6e9ce76fbf0..8d2677a28986 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/TextAnalyticsResult.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/models/TextAnalyticsResult.java @@ -47,6 +47,9 @@ public String getId() { * Get the statistics of the text document. * * @return The {@link TextDocumentStatistics} statistics of the text document. + * + * @throws TextAnalyticsException if result has {@code isError} equals to true and when a non-error property + * was accessed. */ public TextDocumentStatistics getStatistics() { throwExceptionIfError(); diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/util/RecognizePiiEntitiesResultCollection.java b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/util/RecognizePiiEntitiesResultCollection.java new file mode 100644 index 000000000000..fc3c18aba560 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/util/RecognizePiiEntitiesResultCollection.java @@ -0,0 +1,50 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics.util; + +import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult; +import com.azure.ai.textanalytics.models.TextDocumentBatchStatistics; +import com.azure.core.util.IterableStream; + +/** + * A collection model that contains a list of {@link RecognizePiiEntitiesResult} along with model version and + * batch's statistics. + */ +public class RecognizePiiEntitiesResultCollection extends IterableStream { + private final String modelVersion; + private final TextDocumentBatchStatistics statistics; + + /** + * Create a {@link RecognizePiiEntitiesResultCollection} model that maintains a list of + * {@link RecognizePiiEntitiesResult} along with model version and batch's statistics. + * + * @param documentResults A list of {@link RecognizePiiEntitiesResult}. + * @param modelVersion The model version trained in service for the request. + * @param statistics The batch statistics of response. + */ + public RecognizePiiEntitiesResultCollection(Iterable documentResults, + String modelVersion, TextDocumentBatchStatistics statistics) { + super(documentResults); + this.modelVersion = modelVersion; + this.statistics = statistics; + } + + /** + * Get the model version trained in service for the request. + * + * @return The model version trained in service for the request. + */ + public String getModelVersion() { + return modelVersion; + } + + /** + * Get the batch statistics of response. + * + * @return The batch statistics of response. + */ + public TextDocumentBatchStatistics getStatistics() { + return statistics; + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java index ef79896a7b63..0dc438492e3d 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java @@ -150,4 +150,15 @@ public void extractKeyPhrases() { System.out.println("Extracted phrases:"); textAnalyticsClient.extractKeyPhrases(document).forEach(keyPhrase -> System.out.printf("%s.%n", keyPhrase)); } + + /** + * Code snippet for recognizing Personally Identifiable Information entity in a document. + */ + public void recognizePiiEntity() { + String document = "My SSN is 859-98-0987"; + textAnalyticsClient.recognizePiiEntities(document).forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s," + + " confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore())); + } } diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/RecognizePiiEntities.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/RecognizePiiEntities.java new file mode 100644 index 000000000000..9f53a36551c7 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/RecognizePiiEntities.java @@ -0,0 +1,33 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics; + + +import com.azure.core.credential.AzureKeyCredential; + +/** + * Sample demonstrates how to recognize the Personally Identifiable Information entities of document. + */ +public class RecognizePiiEntities { + /** + * Main method to invoke this demo about how to recognize the Personally Identifiable Information entities of + * document. + * + * @param args Unused arguments to the program. + */ + public static void main(String[] args) { + // Instantiate a client that will be used to call the service. + TextAnalyticsClient client = new TextAnalyticsClientBuilder() + .credential(new AzureKeyCredential("{key}")) + .endpoint("{endpoint}") + .buildClient(); + + // The document that needs be analyzed. + String document = "My SSN is 859-98-0987"; + + client.recognizePiiEntities(document).forEach(entity -> System.out.printf( + "Recognized Personal Identifiable Information entity: %s, entity category: %s, entity sub-category: %s, score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore())); + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/RecognizePiiEntitiesAsync.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/RecognizePiiEntitiesAsync.java new file mode 100644 index 000000000000..379acc5cded6 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/RecognizePiiEntitiesAsync.java @@ -0,0 +1,45 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics; + +import com.azure.core.credential.AzureKeyCredential; + +import java.util.concurrent.TimeUnit; + +/** + * Sample demonstrates how to recognize the Personally Identifiable Information entities of document. + */ +public class RecognizePiiEntitiesAsync { + /** + * Main method to invoke this demo about how to recognize the Personally Identifiable Information entities of document. + * + * @param args Unused arguments to the program. + */ + public static void main(String[] args) { + // Instantiate a client that will be used to call the service. + TextAnalyticsAsyncClient client = new TextAnalyticsClientBuilder() + .credential(new AzureKeyCredential("{key}")) + .endpoint("{endpoint}") + .buildAsyncClient(); + + // The document that needs be analyzed. + String document = "My SSN is 859-98-0987"; + + client.recognizePiiEntities(document).subscribe( + entityCollection -> entityCollection.forEach(entity -> System.out.printf( + "Recognized Personal Identifiable Information entity: %s, entity category: %s, entity sub-category: %s, score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore())), + error -> System.err.println("There was an error recognizing PII entities of the text." + error), + () -> System.out.println("Entities recognized.") + ); + + // The .subscribe() creation and assignment is not a blocking call. For the purpose of this example, we sleep + // the thread so the program does not end before the send operation is complete. Using .block() instead of + // .subscribe() will turn this into a synchronous call. + try { + TimeUnit.SECONDS.sleep(5); + } catch (InterruptedException ignored) { + } + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClientJavaDocCodeSnippets.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClientJavaDocCodeSnippets.java index fcb6645a8b32..caf7f3960502 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClientJavaDocCodeSnippets.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClientJavaDocCodeSnippets.java @@ -17,6 +17,7 @@ import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; import com.azure.ai.textanalytics.models.TextDocumentBatchStatistics; import com.azure.ai.textanalytics.models.TextDocumentInput; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; import com.azure.core.credential.AzureKeyCredential; import java.util.Arrays; @@ -234,6 +235,98 @@ public void recognizeBatchEntitiesMaxOverload() { // END: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizeCategorizedEntitiesBatch#Iterable-TextAnalyticsRequestOptions } + // Personally Identifiable Information Entity + + /** + * Code snippet for {@link TextAnalyticsAsyncClient#recognizePiiEntities(String)} + */ + public void recognizePiiEntities() { + // BEGIN: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntities#string + String document = "My SSN is 859-98-0987"; + textAnalyticsAsyncClient.recognizePiiEntities(document).subscribe(piiEntityCollection -> + piiEntityCollection.forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s," + + " entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()))); + // END: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntities#string + } + + /** + * Code snippet for {@link TextAnalyticsAsyncClient#recognizePiiEntities(String, String)} + */ + public void recognizePiiEntitiesWithLanguage() { + + // BEGIN: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntities#string-string + String document = "My SSN is 859-98-0987"; + textAnalyticsAsyncClient.recognizePiiEntities(document, "en") + .subscribe(piiEntityCollection -> piiEntityCollection.forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s," + + " entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()))); + // END: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntities#string-string + } + + /** + * Code snippet for {@link TextAnalyticsAsyncClient#recognizePiiEntitiesBatch(Iterable, String, TextAnalyticsRequestOptions)} + */ + public void recognizePiiEntitiesStringListWithOptions() { + // BEGIN: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntitiesBatch#Iterable-String-TextAnalyticsRequestOptions + List documents = Arrays.asList( + "My SSN is 859-98-0987.", + "Visa card 0111 1111 1111 1111." + ); + + // Request options: show statistics and model version + TextAnalyticsRequestOptions requestOptions = new TextAnalyticsRequestOptions().setIncludeStatistics(true) + .setModelVersion("latest"); + + textAnalyticsAsyncClient.recognizePiiEntitiesBatch(documents, "en", requestOptions) + .subscribe(piiEntitiesResults -> { + // Batch statistics + TextDocumentBatchStatistics batchStatistics = piiEntitiesResults.getStatistics(); + System.out.printf("Batch statistics, transaction count: %s, valid document count: %s.%n", + batchStatistics.getTransactionCount(), batchStatistics.getValidDocumentCount()); + + piiEntitiesResults.forEach(recognizePiiEntitiesResult -> + recognizePiiEntitiesResult.getEntities().forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s," + + " entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()))); + }); + // END: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntitiesBatch#Iterable-String-TextAnalyticsRequestOptions + } + + /** + * Code snippet for {@link TextAnalyticsAsyncClient#recognizePiiEntitiesBatchWithResponse(Iterable, + * TextAnalyticsRequestOptions)} + */ + public void recognizeBatchPiiEntitiesMaxOverload() { + // BEGIN: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntitiesBatch#Iterable-TextAnalyticsRequestOptions + List textDocumentInputs1 = Arrays.asList( + new TextDocumentInput("0", "My SSN is 859-98-0987."), + new TextDocumentInput("1", "Visa card 0111 1111 1111 1111.")); + + // Request options: show statistics and model version + TextAnalyticsRequestOptions requestOptions = new TextAnalyticsRequestOptions().setIncludeStatistics(true); + + textAnalyticsAsyncClient.recognizePiiEntitiesBatchWithResponse(textDocumentInputs1, requestOptions) + .subscribe(response -> { + RecognizePiiEntitiesResultCollection piiEntitiesResults = response.getValue(); + // Batch statistics + TextDocumentBatchStatistics batchStatistics = piiEntitiesResults.getStatistics(); + System.out.printf("Batch statistics, transaction count: %s, valid document count: %s.%n", + batchStatistics.getTransactionCount(), batchStatistics.getValidDocumentCount()); + + piiEntitiesResults.forEach(recognizePiiEntitiesResult -> + recognizePiiEntitiesResult.getEntities().forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s," + + " entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()))); + }); + // END: com.azure.ai.textanalytics.TextAnalyticsAsyncClient.recognizePiiEntitiesBatch#Iterable-TextAnalyticsRequestOptions + } + + // Linked Entity /** diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/TextAnalyticsClientJavaDocCodeSnippets.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/TextAnalyticsClientJavaDocCodeSnippets.java index 49aa84d7c940..346e5111df91 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/TextAnalyticsClientJavaDocCodeSnippets.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/TextAnalyticsClientJavaDocCodeSnippets.java @@ -3,20 +3,22 @@ package com.azure.ai.textanalytics; -import com.azure.ai.textanalytics.util.AnalyzeSentimentResultCollection; import com.azure.ai.textanalytics.models.CategorizedEntity; import com.azure.ai.textanalytics.models.CategorizedEntityCollection; import com.azure.ai.textanalytics.models.DetectLanguageInput; -import com.azure.ai.textanalytics.util.DetectLanguageResultCollection; import com.azure.ai.textanalytics.models.DetectedLanguage; import com.azure.ai.textanalytics.models.DocumentSentiment; -import com.azure.ai.textanalytics.util.ExtractKeyPhrasesResultCollection; -import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; -import com.azure.ai.textanalytics.util.RecognizeLinkedEntitiesResultCollection; +import com.azure.ai.textanalytics.models.PiiEntity; import com.azure.ai.textanalytics.models.SentenceSentiment; import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; import com.azure.ai.textanalytics.models.TextDocumentBatchStatistics; import com.azure.ai.textanalytics.models.TextDocumentInput; +import com.azure.ai.textanalytics.util.AnalyzeSentimentResultCollection; +import com.azure.ai.textanalytics.util.DetectLanguageResultCollection; +import com.azure.ai.textanalytics.util.ExtractKeyPhrasesResultCollection; +import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; +import com.azure.ai.textanalytics.util.RecognizeLinkedEntitiesResultCollection; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; import com.azure.core.credential.AzureKeyCredential; import com.azure.core.http.HttpPipeline; import com.azure.core.http.HttpPipelineBuilder; @@ -239,6 +241,90 @@ public void recognizeBatchEntitiesMaxOverload() { // END: com.azure.ai.textanalytics.TextAnalyticsClient.recognizeEntitiesBatch#Iterable-TextAnalyticsRequestOptions-Context } + // Personally Identifiable Information Entity + + /** + * Code snippet for {@link TextAnalyticsClient#recognizePiiEntities(String)} + */ + public void recognizePiiEntities() { + // BEGIN: com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntities#String + for (PiiEntity entity : textAnalyticsClient.recognizePiiEntities("My SSN is 859-98-0987")) { + System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s," + + " entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()); + } + // END: com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntities#String + } + + /** + * Code snippet for {@link TextAnalyticsClient#recognizePiiEntities(String, String)} + */ + public void recognizePiiEntitiesWithLanguage() { + // BEGIN: com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntities#String-String + textAnalyticsClient.recognizePiiEntities("My SSN is 859-98-0987", "en") + .forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s," + + " entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore())); + // END: com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntities#String-String + } + + /** + * Code snippet for {@link TextAnalyticsClient#recognizePiiEntitiesBatch(Iterable, String, TextAnalyticsRequestOptions)} + */ + public void recognizePiiEntitiesStringListWithOptions() { + // BEGIN: com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntitiesBatch#Iterable-String-TextAnalyticsRequestOptions + List documents = Arrays.asList( + "My SSN is 859-98-0987", + "Visa card 4111 1111 1111 1111" + ); + + RecognizePiiEntitiesResultCollection resultCollection = textAnalyticsClient.recognizePiiEntitiesBatch( + documents, "en", new TextAnalyticsRequestOptions().setIncludeStatistics(true)); + + // Batch statistics + TextDocumentBatchStatistics batchStatistics = resultCollection.getStatistics(); + System.out.printf("A batch of documents statistics, transaction count: %s, valid document count: %s.%n", + batchStatistics.getTransactionCount(), batchStatistics.getValidDocumentCount()); + + resultCollection.forEach(recognizePiiEntitiesResult -> + recognizePiiEntitiesResult.getEntities().forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s," + + " entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()))); + // END: com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntitiesBatch#Iterable-String-TextAnalyticsRequestOptions + } + + /** + * Code snippet for {@link TextAnalyticsClient#recognizePiiEntitiesBatchWithResponse(Iterable, TextAnalyticsRequestOptions, Context)} + */ + public void recognizeBatchPiiEntitiesMaxOverload() { + // BEGIN: com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntitiesBatch#Iterable-TextAnalyticsRequestOptions-Context + List textDocumentInputs = Arrays.asList( + new TextDocumentInput("0", "My SSN is 859-98-0987"), + new TextDocumentInput("1", "Visa card 4111 1111 1111 1111") + ); + + Response response = + textAnalyticsClient.recognizePiiEntitiesBatchWithResponse(textDocumentInputs, + new TextAnalyticsRequestOptions().setIncludeStatistics(true), Context.NONE); + + RecognizePiiEntitiesResultCollection resultCollection = response.getValue(); + + // Batch statistics + TextDocumentBatchStatistics batchStatistics = resultCollection.getStatistics(); + System.out.printf("A batch of documents statistics, transaction count: %s, valid document count: %s.%n", + batchStatistics.getTransactionCount(), batchStatistics.getValidDocumentCount()); + + resultCollection.forEach(recognizePiiEntitiesResult -> + recognizePiiEntitiesResult.getEntities().forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s," + + " entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()))); + // END: com.azure.ai.textanalytics.TextAnalyticsClient.recognizePiiEntitiesBatch#Iterable-TextAnalyticsRequestOptions-Context + } + // Linked Entity /** diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocuments.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocuments.java new file mode 100644 index 000000000000..bffca948e2a6 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocuments.java @@ -0,0 +1,79 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics.batch; + +import com.azure.ai.textanalytics.TextAnalyticsClient; +import com.azure.ai.textanalytics.TextAnalyticsClientBuilder; +import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult; +import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; +import com.azure.ai.textanalytics.models.TextDocumentBatchStatistics; +import com.azure.ai.textanalytics.models.TextDocumentInput; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; +import com.azure.core.credential.AzureKeyCredential; +import com.azure.core.http.rest.Response; +import com.azure.core.util.Context; + +import java.util.Arrays; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; + +/** + * Sample demonstrates how to recognize the Personally Identifiable Information(PII) entities of documents. + */ +public class RecognizePiiEntitiesBatchDocuments { + /** + * Main method to invoke this demo about how to recognize the Personally Identifiable Information entities of + * documents. + * + * @param args Unused arguments to the program. + */ + public static void main(String[] args) { + // Instantiate a client that will be used to call the service. + TextAnalyticsClient client = new TextAnalyticsClientBuilder() + .credential(new AzureKeyCredential("{key}")) + .endpoint("{endpoint}") + .buildClient(); + + // The texts that need be analyzed. + List documents = Arrays.asList( + new TextDocumentInput("1", "My SSN is 859-98-0987").setLanguage("en"), + new TextDocumentInput("2", "Visa card 4111 1111 1111 1111").setLanguage("en") + ); + + // Request options: show statistics and model version + TextAnalyticsRequestOptions requestOptions = new TextAnalyticsRequestOptions().setIncludeStatistics(true).setModelVersion("latest"); + + // Recognizing Personally Identifiable Information entities for each document in a batch of documents + Response piiEntitiesBatchResultResponse = + client.recognizePiiEntitiesBatchWithResponse(documents, requestOptions, Context.NONE); + + // Response's status code + System.out.printf("Status code of request response: %d%n", piiEntitiesBatchResultResponse.getStatusCode()); + RecognizePiiEntitiesResultCollection recognizePiiEntitiesResultCollection = piiEntitiesBatchResultResponse.getValue(); + + // Model version + System.out.printf("Results of Azure Text Analytics \"Personally Identifiable Information Entities Recognition\" Model, version: %s%n", recognizePiiEntitiesResultCollection.getModelVersion()); + + // Batch statistics + TextDocumentBatchStatistics batchStatistics = recognizePiiEntitiesResultCollection.getStatistics(); + System.out.printf("Documents statistics: document count = %s, erroneous document count = %s, transaction count = %s, valid document count = %s.%n", + batchStatistics.getDocumentCount(), batchStatistics.getInvalidDocumentCount(), batchStatistics.getTransactionCount(), batchStatistics.getValidDocumentCount()); + + // Recognized Personally Identifiable Information entities for each document in a batch of documents + AtomicInteger counter = new AtomicInteger(); + for (RecognizePiiEntitiesResult entitiesResult : recognizePiiEntitiesResultCollection) { + // Recognized entities for each document in a batch of documents + System.out.printf("%n%s%n", documents.get(counter.getAndIncrement())); + if (entitiesResult.isError()) { + // Erroneous document + System.out.printf("Cannot recognize Personally Identifiable Information entities. Error: %s%n", entitiesResult.getError().getMessage()); + } else { + // Valid document + entitiesResult.getEntities().forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s, offset: %s, length: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getOffset(), entity.getLength(), entity.getConfidenceScore())); + } + } + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocumentsAsync.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocumentsAsync.java new file mode 100644 index 000000000000..2776cdb07954 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocumentsAsync.java @@ -0,0 +1,87 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics.batch; + +import com.azure.ai.textanalytics.TextAnalyticsAsyncClient; +import com.azure.ai.textanalytics.TextAnalyticsClientBuilder; +import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult; +import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; +import com.azure.ai.textanalytics.models.TextDocumentBatchStatistics; +import com.azure.ai.textanalytics.models.TextDocumentInput; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; +import com.azure.core.credential.AzureKeyCredential; + +import java.util.Arrays; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +/** + * Sample demonstrates how to recognize the Personally Identifiable Information(PII) entities of documents. + */ +public class RecognizePiiEntitiesBatchDocumentsAsync { + /** + * Main method to invoke this demo about how to recognize the Personally Identifiable Information entities of + * documents. + * + * @param args Unused arguments to the program. + */ + public static void main(String[] args) { + // Instantiate a client that will be used to call the service. + TextAnalyticsAsyncClient client = new TextAnalyticsClientBuilder() + .credential(new AzureKeyCredential("{key}")) + .endpoint("{endpoint}") + .buildAsyncClient(); + + // The texts that need be analyzed. + List documents = Arrays.asList( + new TextDocumentInput("1", "My SSN is 859-98-0987").setLanguage("en"), + new TextDocumentInput("2", "Visa card 4111 1111 1111 1111").setLanguage("en") + ); + + // Request options: show statistics and model version + TextAnalyticsRequestOptions requestOptions = new TextAnalyticsRequestOptions().setIncludeStatistics(true).setModelVersion("latest"); + + // Recognizing Personally Identifiable Information entities for each document in a batch of documents + client.recognizePiiEntitiesBatchWithResponse(documents, requestOptions).subscribe( + entitiesBatchResultResponse -> { + // Response's status code + System.out.printf("Status code of request response: %d%n", entitiesBatchResultResponse.getStatusCode()); + RecognizePiiEntitiesResultCollection recognizePiiEntitiesResultCollection = entitiesBatchResultResponse.getValue(); + + // Model version + System.out.printf("Results of Azure Text Analytics \"Personally Identifiable Information Entities Recognition\" Model, version: %s%n", recognizePiiEntitiesResultCollection.getModelVersion()); + + // Batch statistics + TextDocumentBatchStatistics batchStatistics = recognizePiiEntitiesResultCollection.getStatistics(); + System.out.printf("Documents statistics: document count = %s, erroneous document count = %s, transaction count = %s, valid document count = %s.%n", + batchStatistics.getDocumentCount(), batchStatistics.getInvalidDocumentCount(), batchStatistics.getTransactionCount(), batchStatistics.getValidDocumentCount()); + + // Recognized Personally Identifiable Information entities for each of documents from a batch of documents + AtomicInteger counter = new AtomicInteger(); + for (RecognizePiiEntitiesResult entitiesResult : recognizePiiEntitiesResultCollection) { + System.out.printf("%n%s%n", documents.get(counter.getAndIncrement())); + if (entitiesResult.isError()) { + // Erroneous document + System.out.printf("Cannot recognize Personally Identifiable Information entities. Error: %s%n", entitiesResult.getError().getMessage()); + } else { + // Valid document + entitiesResult.getEntities().forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s, offset: %s, length: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getOffset(), entity.getLength(), entity.getConfidenceScore())); + } + } + }, + error -> System.err.println("There was an error recognizing Personally Identifiable Information entities of the documents." + error), + () -> System.out.println("Batch of Personally Identifiable Information entities recognized.")); + + // The .subscribe() creation and assignment is not a blocking call. For the purpose of this example, we sleep + // the thread so the program does not end before the send operation is complete. Using .block() instead of + // .subscribe() will turn this into a synchronous call. + try { + TimeUnit.SECONDS.sleep(5); + } catch (InterruptedException ignored) { + } + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchStringDocuments.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchStringDocuments.java new file mode 100644 index 000000000000..8a47516cce99 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchStringDocuments.java @@ -0,0 +1,71 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics.batch; + +import com.azure.ai.textanalytics.TextAnalyticsClient; +import com.azure.ai.textanalytics.TextAnalyticsClientBuilder; +import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult; +import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; +import com.azure.ai.textanalytics.models.TextDocumentBatchStatistics; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; +import com.azure.core.credential.AzureKeyCredential; + +import java.util.Arrays; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; + +/** + * Sample demonstrates how to recognize the Personally Identifiable Information(PII) entities of {@code String} documents. + */ +public class RecognizePiiEntitiesBatchStringDocuments { + /** + * Main method to invoke this demo about how to recognize the Personally Identifiable Information entities of + * documents. + * + * @param args Unused arguments to the program. + */ + public static void main(String[] args) { + // Instantiate a client that will be used to call the service. + TextAnalyticsClient client = new TextAnalyticsClientBuilder() + .credential(new AzureKeyCredential("{key}")) + .endpoint("{endpoint}") + .buildClient(); + + // The texts that need be analyzed. + List documents = Arrays.asList( + "My SSN is 859-98-0987", + "Visa card 4111 1111 1111 1111" + ); + + // Request options: show statistics and model version + TextAnalyticsRequestOptions requestOptions = new TextAnalyticsRequestOptions().setIncludeStatistics(true).setModelVersion("latest"); + + // Recognizing Personally Identifiable Information entities for each document in a batch of documents + RecognizePiiEntitiesResultCollection recognizePiiEntitiesResultCollection = client.recognizePiiEntitiesBatch(documents, "en", requestOptions); + + // Model version + System.out.printf("Results of Azure Text Analytics \"Personally Identifiable Information Entities Recognition\" Model, version: %s%n", recognizePiiEntitiesResultCollection.getModelVersion()); + + // Batch statistics + TextDocumentBatchStatistics batchStatistics = recognizePiiEntitiesResultCollection.getStatistics(); + System.out.printf("Documents statistics: document count = %s, erroneous document count = %s, transaction count = %s, valid document count = %s.%n", + batchStatistics.getDocumentCount(), batchStatistics.getInvalidDocumentCount(), batchStatistics.getTransactionCount(), batchStatistics.getValidDocumentCount()); + + // Recognized Personally Identifiable Information entities for each document in a batch of documents + AtomicInteger counter = new AtomicInteger(); + for (RecognizePiiEntitiesResult entitiesResult : recognizePiiEntitiesResultCollection) { + // Recognized entities for each of documents from a batch of documents + System.out.printf("%nText = %s%n", documents.get(counter.getAndIncrement())); + if (entitiesResult.isError()) { + // Erroneous document + System.out.printf("Cannot recognize Personally Identifiable Information entities. Error: %s%n", entitiesResult.getError().getMessage()); + } else { + // Valid document + entitiesResult.getEntities().forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s, offset: %s, length: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getOffset(), entity.getLength(), entity.getConfidenceScore())); + } + } + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchStringDocumentsAsync.java b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchStringDocumentsAsync.java new file mode 100644 index 000000000000..0ec3777d1aff --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchStringDocumentsAsync.java @@ -0,0 +1,80 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package com.azure.ai.textanalytics.batch; + +import com.azure.ai.textanalytics.TextAnalyticsAsyncClient; +import com.azure.ai.textanalytics.TextAnalyticsClientBuilder; +import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult; +import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; +import com.azure.ai.textanalytics.models.TextDocumentBatchStatistics; +import com.azure.core.credential.AzureKeyCredential; + +import java.util.Arrays; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +/** + * Sample demonstrates how to recognize the PII(Personally Identifiable Information) entities of {@code String} documents. + */ +public class RecognizePiiEntitiesBatchStringDocumentsAsync { + /** + * Main method to invoke this demo about how to recognize the Personally Identifiable Information entities of + * documents. + * + * @param args Unused arguments to the program. + */ + public static void main(String[] args) { + // Instantiate a client that will be used to call the service. + TextAnalyticsAsyncClient client = new TextAnalyticsClientBuilder() + .credential(new AzureKeyCredential("{key}")) + .endpoint("{endpoint}") + .buildAsyncClient(); + + // The texts that need be analyzed. + List documents = Arrays.asList( + "My SSN is 859-98-0987", + "Visa card 4111 1111 1111 1111" + ); + + // Request options: show statistics and model version + TextAnalyticsRequestOptions requestOptions = new TextAnalyticsRequestOptions().setIncludeStatistics(true).setModelVersion("latest"); + + // Recognizing Personally Identifiable Information entities for each document in a batch of documents + AtomicInteger counter = new AtomicInteger(); + client.recognizePiiEntitiesBatch(documents, "en", requestOptions).subscribe( + recognizePiiEntitiesResultCollection -> { + // Model version + System.out.printf("Results of Azure Text Analytics \"Personally Identifiable Information Entities Recognition\" Model, version: %s%n", recognizePiiEntitiesResultCollection.getModelVersion()); + + // Batch statistics + TextDocumentBatchStatistics batchStatistics = recognizePiiEntitiesResultCollection.getStatistics(); + System.out.printf("Documents statistics: document count = %s, erroneous document count = %s, transaction count = %s, valid document count = %s.%n", + batchStatistics.getDocumentCount(), batchStatistics.getInvalidDocumentCount(), batchStatistics.getTransactionCount(), batchStatistics.getValidDocumentCount()); + + for (RecognizePiiEntitiesResult entitiesResult : recognizePiiEntitiesResultCollection) { + System.out.printf("%nText = %s%n", documents.get(counter.getAndIncrement())); + if (entitiesResult.isError()) { + // Erroneous document + System.out.printf("Cannot recognize Personally Identifiable Information entities. Error: %s%n", entitiesResult.getError().getMessage()); + } else { + // Valid document + entitiesResult.getEntities().forEach(entity -> System.out.printf( + "Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s, confidence score: %f.%n", + entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore())); + } + } + }, + error -> System.err.println("There was an error recognizing Personally Identifiable Information entities of the documents." + error), + () -> System.out.println("Batch of Personally Identifiable Information entities recognized.")); + + // The .subscribe() creation and assignment is not a blocking call. For the purpose of this example, we sleep + // the thread so the program does not end before the send operation is complete. Using .block() instead of + // .subscribe() will turn this into a synchronous call. + try { + TimeUnit.SECONDS.sleep(5); + } catch (InterruptedException ignored) { + } + } +} diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TestUtils.java b/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TestUtils.java index f96c212992fa..7ebb0b86793e 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TestUtils.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TestUtils.java @@ -4,6 +4,9 @@ package com.azure.ai.textanalytics; import com.azure.ai.textanalytics.models.AnalyzeSentimentResult; +import com.azure.ai.textanalytics.models.PiiEntity; +import com.azure.ai.textanalytics.models.PiiEntityCollection; +import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult; import com.azure.ai.textanalytics.util.AnalyzeSentimentResultCollection; import com.azure.ai.textanalytics.models.CategorizedEntity; import com.azure.ai.textanalytics.models.CategorizedEntityCollection; @@ -29,6 +32,7 @@ import com.azure.ai.textanalytics.models.TextDocumentInput; import com.azure.ai.textanalytics.models.TextDocumentStatistics; import com.azure.ai.textanalytics.models.TextSentiment; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; import com.azure.core.exception.HttpResponseException; import com.azure.core.http.HttpClient; import com.azure.core.util.Configuration; @@ -64,6 +68,10 @@ final class TestUtils { static final List CATEGORIZED_ENTITY_INPUTS = Arrays.asList( "I had a wonderful trip to Seattle last week.", "I work at Microsoft."); + static final List PII_ENTITY_INPUTS = Arrays.asList( + "Microsoft employee with ssn 859-98-0987 is using our awesome API's.", + "Your ABA number - 111000025 - is the first 9 digits in the lower left hand corner of your personal check."); + static final List LINKED_ENTITY_INPUTS = Arrays.asList( "I had a wonderful trip to Seattle last week.", "I work at Microsoft."); @@ -213,6 +221,43 @@ static RecognizeEntitiesResult getExpectedBatchCategorizedEntities2() { return recognizeEntitiesResult2; } + /** + * Helper method to get the expected batch of Personally Identifiable Information entities + */ + static RecognizePiiEntitiesResultCollection getExpectedBatchPiiEntities() { + PiiEntityCollection piiEntityCollection = new PiiEntityCollection(new IterableStream<>(getPiiEntitiesList1()), null); + PiiEntityCollection piiEntityCollection2 = new PiiEntityCollection(new IterableStream<>(getPiiEntitiesList2()), null); + TextDocumentStatistics textDocumentStatistics1 = new TextDocumentStatistics(67, 1); + TextDocumentStatistics textDocumentStatistics2 = new TextDocumentStatistics(105, 1); + RecognizePiiEntitiesResult recognizeEntitiesResult1 = new RecognizePiiEntitiesResult("0", textDocumentStatistics1, null, piiEntityCollection); + RecognizePiiEntitiesResult recognizeEntitiesResult2 = new RecognizePiiEntitiesResult("1", textDocumentStatistics2, null, piiEntityCollection2); + + return new RecognizePiiEntitiesResultCollection( + Arrays.asList(recognizeEntitiesResult1, recognizeEntitiesResult2), + DEFAULT_MODEL_VERSION, + new TextDocumentBatchStatistics(2, 2, 0, 2)); + } + + /** + * Helper method to get the expected Categorized Entities List 1 + */ + static List getPiiEntitiesList1() { + PiiEntity piiEntity0 = new PiiEntity("Microsoft", EntityCategory.ORGANIZATION, null, 1.0, 0, 9); + PiiEntity piiEntity1 = new PiiEntity("859-98-0987", EntityCategory.fromString("U.S. Social Security Number (SSN)"), null, 0.65, 28, 11); + return Arrays.asList(piiEntity0, piiEntity1); + } + + /** + * Helper method to get the expected Categorized Entities List 2 + */ + static List getPiiEntitiesList2() { + PiiEntity piiEntity2 = new PiiEntity("111000025", EntityCategory.fromString("Phone Number"), null, 0.8, 18, 9); + PiiEntity piiEntity3 = new PiiEntity("111000025", EntityCategory.fromString("ABA Routing Number"), null, 0.75, 18, 9); + PiiEntity piiEntity4 = new PiiEntity("111000025", EntityCategory.fromString("New Zealand Social Welfare Number"), null, 0.65, 18, 9); + PiiEntity piiEntity5 = new PiiEntity("111000025", EntityCategory.fromString("Portugal Tax Identification Number"), null, 0.65, 18, 9); + return Arrays.asList(piiEntity2, piiEntity3, piiEntity4, piiEntity5); + } + /** * Helper method to get the expected Batch Linked Entities * @return A {@link RecognizeLinkedEntitiesResultCollection}. diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClientTest.java b/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClientTest.java index 06ceb99375e4..d1e53a0c232b 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClientTest.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsAsyncClientTest.java @@ -31,7 +31,9 @@ import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchDetectedLanguages; import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchKeyPhrases; import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchLinkedEntities; +import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchPiiEntities; import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchTextSentiment; +import static com.azure.ai.textanalytics.TestUtils.getPiiEntitiesList1; import static com.azure.ai.textanalytics.TestUtils.getLinkedEntitiesList1; import static com.azure.ai.textanalytics.TestUtils.getUnknownDetectedLanguage; import static com.azure.ai.textanalytics.models.WarningCode.LONG_WORDS_IN_DOCUMENT; @@ -333,6 +335,104 @@ public void recognizeEntitiesForListWithOptions(HttpClient httpClient, TextAnaly .verifyComplete()); } + // Recognize Personally Identifiable Information entity + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForTextInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + recognizePiiSingleDocumentRunner(document -> + StepVerifier.create(client.recognizePiiEntities(document)) + .assertNext(response -> validatePiiEntities(getPiiEntitiesList1(), response.stream().collect(Collectors.toList()))) + .verifyComplete()); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForEmptyText(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + emptyTextRunner(document -> StepVerifier.create(client.recognizePiiEntities(document)) + .expectErrorMatches(throwable -> throwable instanceof TextAnalyticsException + && INVALID_DOCUMENT_EXPECTED_EXCEPTION_MESSAGE.equals(throwable.getMessage())) + .verify()); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForFaultyText(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + faultyTextRunner(document -> + StepVerifier.create(client.recognizePiiEntities(document)) + .assertNext(result -> assertFalse(result.getWarnings().iterator().hasNext())) + .verifyComplete()); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesDuplicateIdInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + recognizeBatchPiiEntityDuplicateIdRunner(inputs -> + StepVerifier.create(client.recognizePiiEntitiesBatchWithResponse(inputs, null)) + .verifyErrorSatisfies(ex -> assertEquals(HttpResponseException.class, ex.getClass()))); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesBatchInputSingleError(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + recognizeBatchPiiEntitySingleErrorRunner((inputs) -> + StepVerifier.create(client.recognizePiiEntitiesBatchWithResponse(inputs, null)) + .assertNext(resultCollection -> { + resultCollection.getValue().forEach(result -> { + assertTrue(result.isError()); + final TextAnalyticsError error = result.getError(); + TextAnalyticsErrorCode errorCode = error.getErrorCode(); + assertTrue(TextAnalyticsErrorCode.fromString("invalidDocument").equals(errorCode)); + assertTrue("Document text is empty.".equals(error.getMessage())); + }); + }).verifyComplete()); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForBatchInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + recognizeBatchPiiEntitiesRunner((inputs) -> + StepVerifier.create(client.recognizePiiEntitiesBatchWithResponse(inputs, null)) + .assertNext(response -> validatePiiEntitiesResultCollectionWithResponse(false, getExpectedBatchPiiEntities(), 200, response)) + .verifyComplete()); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForBatchInputShowStatistics(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + recognizeBatchPiiEntitiesShowStatsRunner((inputs, options) -> + StepVerifier.create(client.recognizePiiEntitiesBatchWithResponse(inputs, options)) + .assertNext(response -> validatePiiEntitiesResultCollectionWithResponse(true, getExpectedBatchPiiEntities(), 200, response)) + .verifyComplete()); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForListLanguageHint(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + recognizePiiLanguageHintRunner((inputs, language) -> + StepVerifier.create(client.recognizePiiEntitiesBatch(inputs, language, null)) + .assertNext(response -> validatePiiEntitiesResultCollection(false, getExpectedBatchPiiEntities(), response)) + .verifyComplete()); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForListStringWithOptions(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsAsyncClient(httpClient, serviceVersion); + recognizeStringBatchPiiEntitiesShowStatsRunner((inputs, options) -> + StepVerifier.create(client.recognizePiiEntitiesBatch(inputs, null, options)) + .assertNext(response -> validatePiiEntitiesResultCollection(true, getExpectedBatchPiiEntities(), response)) + .verifyComplete()); + } + // Linked Entities @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsClientTest.java b/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsClientTest.java index 54bd0edb3722..b2f37e422887 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsClientTest.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsClientTest.java @@ -6,11 +6,13 @@ import com.azure.ai.textanalytics.models.CategorizedEntity; import com.azure.ai.textanalytics.models.DocumentSentiment; import com.azure.ai.textanalytics.models.LinkedEntity; +import com.azure.ai.textanalytics.models.PiiEntityCollection; import com.azure.ai.textanalytics.models.SentenceSentiment; import com.azure.ai.textanalytics.models.SentimentConfidenceScores; import com.azure.ai.textanalytics.models.TextAnalyticsException; import com.azure.ai.textanalytics.models.TextSentiment; import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; import com.azure.core.exception.HttpResponseException; import com.azure.core.http.HttpClient; import com.azure.core.http.rest.Response; @@ -32,8 +34,10 @@ import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchDetectedLanguages; import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchKeyPhrases; import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchLinkedEntities; +import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchPiiEntities; import static com.azure.ai.textanalytics.TestUtils.getExpectedBatchTextSentiment; import static com.azure.ai.textanalytics.TestUtils.getLinkedEntitiesList1; +import static com.azure.ai.textanalytics.TestUtils.getPiiEntitiesList1; import static com.azure.ai.textanalytics.TestUtils.getUnknownDetectedLanguage; import static com.azure.ai.textanalytics.models.WarningCode.LONG_WORDS_IN_DOCUMENT; import static org.junit.jupiter.api.Assertions.assertEquals; @@ -111,6 +115,7 @@ public void detectLanguagesBatchStringInput(HttpClient httpClient, TextAnalytics /** * Verifies that a single DetectLanguageResult is returned for a document to detect language. */ + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") public void detectSingleTextLanguage(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { client = getTextAnalyticsClient(httpClient, serviceVersion); @@ -240,7 +245,7 @@ public void recognizeEntitiesBatchInputSingleError(HttpClient httpClient, TextAn Response response = client.recognizeEntitiesBatchWithResponse(inputs, null, Context.NONE); response.getValue().forEach(recognizeEntitiesResult -> { Exception exception = assertThrows(TextAnalyticsException.class, recognizeEntitiesResult::getEntities); - assertEquals(exception.getMessage(), BATCH_ERROR_EXCEPTION_MESSAGE); + assertEquals(String.format(BATCH_ERROR_EXCEPTION_MESSAGE, "RecognizeEntitiesResult"), exception.getMessage()); }); }); } @@ -294,6 +299,97 @@ public void recognizeEntitiesForListWithOptions(HttpClient httpClient, TextAnaly ); } + // Recognize Personally Identifiable Information entity + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForTextInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + recognizePiiSingleDocumentRunner(document -> { + final PiiEntityCollection entities = client.recognizePiiEntities(document); + validatePiiEntities(getPiiEntitiesList1(), entities.stream().collect(Collectors.toList())); + }); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForEmptyText(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + emptyTextRunner(document -> { + final Exception exception = assertThrows(TextAnalyticsException.class, () -> + client.recognizePiiEntities(document).iterator().hasNext()); + assertTrue(INVALID_DOCUMENT_EXPECTED_EXCEPTION_MESSAGE.equals(exception.getMessage())); + }); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForFaultyText(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + faultyTextRunner(document -> assertFalse(client.recognizePiiEntities(document).iterator().hasNext())); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesDuplicateIdInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + recognizeBatchPiiEntityDuplicateIdRunner(inputs -> { + HttpResponseException response = assertThrows(HttpResponseException.class, + () -> client.recognizePiiEntitiesBatchWithResponse(inputs, null, Context.NONE)); + assertEquals(HttpURLConnection.HTTP_BAD_REQUEST, response.getResponse().getStatusCode()); + }); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesBatchInputSingleError(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + recognizeBatchPiiEntitySingleErrorRunner((inputs) -> { + Response response = client.recognizePiiEntitiesBatchWithResponse(inputs, null, Context.NONE); + response.getValue().forEach(recognizePiiEntitiesResult -> { + Exception exception = assertThrows(TextAnalyticsException.class, recognizePiiEntitiesResult::getEntities); + assertEquals(String.format(BATCH_ERROR_EXCEPTION_MESSAGE, "RecognizePiiEntitiesResult"), exception.getMessage()); + }); + }); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForBatchInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + recognizeBatchPiiEntitiesRunner(inputs -> + validatePiiEntitiesResultCollectionWithResponse(false, getExpectedBatchPiiEntities(), 200, + client.recognizePiiEntitiesBatchWithResponse(inputs, null, Context.NONE))); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForBatchInputShowStatistics(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + recognizeBatchPiiEntitiesShowStatsRunner((inputs, options) -> + validatePiiEntitiesResultCollectionWithResponse(true, getExpectedBatchPiiEntities(), 200, + client.recognizePiiEntitiesBatchWithResponse(inputs, options, Context.NONE))); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForListLanguageHint(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + recognizePiiEntitiesLanguageHintRunner((inputs, language) -> + validatePiiEntitiesResultCollection(false, getExpectedBatchPiiEntities(), + client.recognizePiiEntitiesBatch(inputs, language, null)) + ); + } + + @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) + @MethodSource("com.azure.ai.textanalytics.TestUtils#getTestParameters") + public void recognizePiiEntitiesForListStringWithOptions(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion) { + client = getTextAnalyticsClient(httpClient, serviceVersion); + recognizeStringBatchPiiEntitiesShowStatsRunner((inputs, options) -> + validatePiiEntitiesResultCollection(true, getExpectedBatchPiiEntities(), + client.recognizePiiEntitiesBatch(inputs, null, options))); + } + // Recognize linked entity @ParameterizedTest(name = DISPLAY_NAME_WITH_ARGUMENTS) diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsClientTestBase.java b/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsClientTestBase.java index bbf321e3a902..a212b0e05392 100644 --- a/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsClientTestBase.java +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/java/com/azure/ai/textanalytics/TextAnalyticsClientTestBase.java @@ -9,6 +9,7 @@ import com.azure.ai.textanalytics.models.DocumentSentiment; import com.azure.ai.textanalytics.models.LinkedEntity; import com.azure.ai.textanalytics.models.LinkedEntityMatch; +import com.azure.ai.textanalytics.models.PiiEntity; import com.azure.ai.textanalytics.models.SentenceSentiment; import com.azure.ai.textanalytics.models.TextAnalyticsError; import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions; @@ -21,6 +22,7 @@ import com.azure.ai.textanalytics.util.ExtractKeyPhrasesResultCollection; import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection; import com.azure.ai.textanalytics.util.RecognizeLinkedEntitiesResultCollection; +import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection; import com.azure.core.credential.AzureKeyCredential; import com.azure.core.http.HttpClient; import com.azure.core.http.policy.HttpLogDetailLevel; @@ -47,6 +49,7 @@ import static com.azure.ai.textanalytics.TestUtils.FAKE_API_KEY; import static com.azure.ai.textanalytics.TestUtils.KEY_PHRASE_INPUTS; import static com.azure.ai.textanalytics.TestUtils.LINKED_ENTITY_INPUTS; +import static com.azure.ai.textanalytics.TestUtils.PII_ENTITY_INPUTS; import static com.azure.ai.textanalytics.TestUtils.SENTIMENT_INPUTS; import static com.azure.ai.textanalytics.TestUtils.TOO_LONG_INPUT; import static com.azure.ai.textanalytics.TestUtils.getDuplicateTextDocumentInputs; @@ -56,7 +59,7 @@ import static org.junit.jupiter.api.Assertions.assertNull; public abstract class TextAnalyticsClientTestBase extends TestBase { - static final String BATCH_ERROR_EXCEPTION_MESSAGE = "Error in accessing the property on document id: 2, when RecognizeEntitiesResult returned with an error: Document text is empty. ErrorCodeValue: {invalidDocument}"; + static final String BATCH_ERROR_EXCEPTION_MESSAGE = "Error in accessing the property on document id: 2, when %s returned with an error: Document text is empty. ErrorCodeValue: {invalidDocument}"; static final String EXCEEDED_ALLOWED_DOCUMENTS_LIMITS_MESSAGE = "The number of documents in the request have exceeded the data limitations. See https://aka.ms/text-analytics-data-limits for additional information"; static final String INVALID_COUNTRY_HINT_EXPECTED_EXCEPTION_MESSAGE = "Country hint is not valid. Please specify an ISO 3166-1 alpha-2 two letter country code. ErrorCodeValue: {invalidCountryHint}"; static final String INVALID_DOCUMENT_BATCH = "invalidDocumentBatch"; @@ -107,6 +110,31 @@ public abstract class TextAnalyticsClientTestBase extends TestBase { @Test abstract void recognizeEntitiesForListLanguageHint(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + // Personally Identifiable Information Entities + @Test + abstract void recognizePiiEntitiesForTextInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + + @Test + abstract void recognizePiiEntitiesForEmptyText(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + + @Test + abstract void recognizePiiEntitiesForFaultyText(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + + @Test + abstract void recognizePiiEntitiesDuplicateIdInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + + @Test + abstract void recognizePiiEntitiesBatchInputSingleError(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + + @Test + abstract void recognizePiiEntitiesForBatchInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + + @Test + abstract void recognizePiiEntitiesForBatchInputShowStatistics(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + + @Test + abstract void recognizePiiEntitiesForListLanguageHint(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); + // Linked Entities @Test abstract void recognizeLinkedEntitiesForTextInput(HttpClient httpClient, TextAnalyticsServiceVersion serviceVersion); @@ -268,6 +296,52 @@ void recognizeEntitiesTooManyDocumentsRunner( Arrays.asList(documentInput, documentInput, documentInput, documentInput, documentInput, documentInput)); } + // Personally Identifiable Information Entity runner + void recognizePiiSingleDocumentRunner(Consumer testRunner) { + testRunner.accept(PII_ENTITY_INPUTS.get(0)); + } + + void recognizePiiLanguageHintRunner(BiConsumer, String> testRunner) { + testRunner.accept(PII_ENTITY_INPUTS, "en"); + } + + void recognizeBatchPiiEntityDuplicateIdRunner(Consumer> testRunner) { + testRunner.accept(getDuplicateTextDocumentInputs()); + } + + void recognizePiiEntitiesLanguageHintRunner(BiConsumer, String> testRunner) { + testRunner.accept(PII_ENTITY_INPUTS, "en"); + } + + void recognizeBatchPiiEntitySingleErrorRunner(Consumer> testRunner) { + List inputs = Collections.singletonList(new TextDocumentInput("2", " ")); + testRunner.accept(inputs); + } + + void recognizeBatchPiiEntitiesRunner(Consumer> testRunner) { + testRunner.accept(TestUtils.getTextDocumentInputs(PII_ENTITY_INPUTS)); + } + + void recognizeBatchPiiEntitiesShowStatsRunner( + BiConsumer, TextAnalyticsRequestOptions> testRunner) { + final List textDocumentInputs = TestUtils.getTextDocumentInputs(PII_ENTITY_INPUTS); + TextAnalyticsRequestOptions options = new TextAnalyticsRequestOptions().setIncludeStatistics(true); + + testRunner.accept(textDocumentInputs, options); + } + + void recognizeStringBatchPiiEntitiesShowStatsRunner( + BiConsumer, TextAnalyticsRequestOptions> testRunner) { + testRunner.accept(PII_ENTITY_INPUTS, new TextAnalyticsRequestOptions().setIncludeStatistics(true)); + } + + void recognizePiiEntitiesTooManyDocumentsRunner(Consumer> testRunner) { + final String documentInput = PII_ENTITY_INPUTS.get(0); + // max num of document size is 5 + testRunner.accept( + Arrays.asList(documentInput, documentInput, documentInput, documentInput, documentInput, documentInput)); + } + // Linked Entity runner void recognizeLinkedEntitiesForSingleTextInputRunner(Consumer testRunner) { testRunner.accept(LINKED_ENTITY_INPUTS.get(0)); @@ -412,8 +486,7 @@ TextAnalyticsClientBuilder getTextAnalyticsAsyncClientBuilder(HttpClient httpCli } static void validateDetectLanguageResultCollectionWithResponse(boolean showStatistics, - DetectLanguageResultCollection expected, - int expectedStatusCode, + DetectLanguageResultCollection expected, int expectedStatusCode, Response response) { assertNotNull(response); assertEquals(expectedStatusCode, response.getStatusCode()); @@ -421,40 +494,53 @@ static void validateDetectLanguageResultCollectionWithResponse(boolean showStati } static void validateDetectLanguageResultCollection(boolean showStatistics, - DetectLanguageResultCollection expected, - DetectLanguageResultCollection actual) { + DetectLanguageResultCollection expected, DetectLanguageResultCollection actual) { validateTextAnalyticsResult(showStatistics, expected, actual, (expectedItem, actualItem) -> validatePrimaryLanguage(expectedItem.getPrimaryLanguage(), actualItem.getPrimaryLanguage())); } static void validateCategorizedEntitiesResultCollectionWithResponse(boolean showStatistics, - RecognizeEntitiesResultCollection expected, - int expectedStatusCode, Response response) { + RecognizeEntitiesResultCollection expected, int expectedStatusCode, + Response response) { assertNotNull(response); assertEquals(expectedStatusCode, response.getStatusCode()); validateCategorizedEntitiesResultCollection(showStatistics, expected, response.getValue()); } static void validateCategorizedEntitiesResultCollection(boolean showStatistics, - RecognizeEntitiesResultCollection expected, - RecognizeEntitiesResultCollection actual) { + RecognizeEntitiesResultCollection expected, RecognizeEntitiesResultCollection actual) { validateTextAnalyticsResult(showStatistics, expected, actual, (expectedItem, actualItem) -> validateCategorizedEntities( expectedItem.getEntities().stream().collect(Collectors.toList()), actualItem.getEntities().stream().collect(Collectors.toList()))); } + static void validatePiiEntitiesResultCollectionWithResponse(boolean showStatistics, + RecognizePiiEntitiesResultCollection expected, int expectedStatusCode, + Response response) { + assertNotNull(response); + assertEquals(expectedStatusCode, response.getStatusCode()); + validatePiiEntitiesResultCollection(showStatistics, expected, response.getValue()); + } + + static void validatePiiEntitiesResultCollection(boolean showStatistics, + RecognizePiiEntitiesResultCollection expected, RecognizePiiEntitiesResultCollection actual) { + validateTextAnalyticsResult(showStatistics, expected, actual, (expectedItem, actualItem) -> + validatePiiEntities( + expectedItem.getEntities().stream().collect(Collectors.toList()), + actualItem.getEntities().stream().collect(Collectors.toList()))); + } + static void validateLinkedEntitiesResultCollectionWithResponse(boolean showStatistics, - RecognizeLinkedEntitiesResultCollection expected, - int expectedStatusCode, Response response) { + RecognizeLinkedEntitiesResultCollection expected, int expectedStatusCode, + Response response) { assertNotNull(response); assertEquals(expectedStatusCode, response.getStatusCode()); validateLinkedEntitiesResultCollection(showStatistics, expected, response.getValue()); } static void validateLinkedEntitiesResultCollection(boolean showStatistics, - RecognizeLinkedEntitiesResultCollection expected, - RecognizeLinkedEntitiesResultCollection actual) { + RecognizeLinkedEntitiesResultCollection expected, RecognizeLinkedEntitiesResultCollection actual) { validateTextAnalyticsResult(showStatistics, expected, actual, (expectedItem, actualItem) -> validateLinkedEntities( expectedItem.getEntities().stream().collect(Collectors.toList()), @@ -462,16 +548,15 @@ static void validateLinkedEntitiesResultCollection(boolean showStatistics, } static void validateExtractKeyPhrasesResultCollectionWithResponse(boolean showStatistics, - ExtractKeyPhrasesResultCollection expected, - int expectedStatusCode, Response response) { + ExtractKeyPhrasesResultCollection expected, int expectedStatusCode, + Response response) { assertNotNull(response); assertEquals(expectedStatusCode, response.getStatusCode()); validateExtractKeyPhrasesResultCollection(showStatistics, expected, response.getValue()); } static void validateExtractKeyPhrasesResultCollection(boolean showStatistics, - ExtractKeyPhrasesResultCollection expected, - ExtractKeyPhrasesResultCollection actual) { + ExtractKeyPhrasesResultCollection expected, ExtractKeyPhrasesResultCollection actual) { validateTextAnalyticsResult(showStatistics, expected, actual, (expectedItem, actualItem) -> validateKeyPhrases( expectedItem.getKeyPhrases().stream().collect(Collectors.toList()), @@ -479,16 +564,15 @@ static void validateExtractKeyPhrasesResultCollection(boolean showStatistics, } static void validateSentimentResultCollectionWithResponse(boolean showStatistics, - AnalyzeSentimentResultCollection expected, - int expectedStatusCode, Response response) { + AnalyzeSentimentResultCollection expected, int expectedStatusCode, + Response response) { assertNotNull(response); assertEquals(expectedStatusCode, response.getStatusCode()); validateSentimentResultCollection(showStatistics, expected, response.getValue()); } static void validateSentimentResultCollection(boolean showStatistics, - AnalyzeSentimentResultCollection expected, - AnalyzeSentimentResultCollection actual) { + AnalyzeSentimentResultCollection expected, AnalyzeSentimentResultCollection actual) { validateTextAnalyticsResult(showStatistics, expected, actual, (expectedItem, actualItem) -> validateAnalyzedSentiment(expectedItem.getDocumentSentiment(), actualItem.getDocumentSentiment())); } @@ -522,6 +606,21 @@ static void validateCategorizedEntity( assertNotNull(actualCategorizedEntity.getConfidenceScore()); } + /** + * Helper method to validate a single Personally Identifiable Information entity. + * + * @param expectedPiiEntity PiiEntity returned by the service. + * @param actualPiiEntity PiiEntity returned by the API. + */ + static void validatePiiEntity(PiiEntity expectedPiiEntity, PiiEntity actualPiiEntity) { + assertEquals(expectedPiiEntity.getLength() > 0, actualPiiEntity.getLength() > 0); + assertEquals(expectedPiiEntity.getOffset(), actualPiiEntity.getOffset()); + assertEquals(expectedPiiEntity.getSubcategory(), actualPiiEntity.getSubcategory()); + assertEquals(expectedPiiEntity.getText(), actualPiiEntity.getText()); + assertEquals(expectedPiiEntity.getCategory(), actualPiiEntity.getCategory()); + assertNotNull(actualPiiEntity.getConfidenceScore()); + } + /** * Helper method to validate a single linked entity. * @@ -572,6 +671,24 @@ static void validateCategorizedEntities(List expectedCategori } } + /** + * Helper method to validate the list of Personally Identifiable Information entities. + * + * @param expectedPiiEntityList piiEntities returned by the service. + * @param actualPiiEntityList piiEntities returned by the API. + */ + static void validatePiiEntities(List expectedPiiEntityList, List actualPiiEntityList) { + assertEquals(expectedPiiEntityList.size(), actualPiiEntityList.size()); + expectedPiiEntityList.sort(Comparator.comparing(PiiEntity::getText)); + actualPiiEntityList.sort(Comparator.comparing(PiiEntity::getText)); + + for (int i = 0; i < expectedPiiEntityList.size(); i++) { + PiiEntity expectedPiiEntity = expectedPiiEntityList.get(i); + PiiEntity actualPiiEntity = actualPiiEntityList.get(i); + validatePiiEntity(expectedPiiEntity, actualPiiEntity); + } + } + /** * Helper method to validate the list of linked entities. * diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesBatchInputSingleError.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesBatchInputSingleError.json new file mode 100644 index 000000000000..118ce920a811 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesBatchInputSingleError.json @@ -0,0 +1,25 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/1.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "69d2e167-8e03-4958-943b-ccecdf717d16", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "3", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "apim-request-id" : "2158cc12-92dd-4979-b675-2e326a3a6a07", + "retry-after" : "0", + "StatusCode" : "200", + "Body" : "{\"documents\":[],\"errors\":[{\"id\":\"2\",\"error\":{\"code\":\"InvalidArgument\",\"message\":\"Invalid document in request.\",\"innererror\":{\"code\":\"InvalidDocument\",\"message\":\"Document text is empty.\"}}}],\"modelVersion\":\"2020-04-01\"}", + "Date" : "Mon, 27 Jul 2020 16:40:05 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesDuplicateIdInput.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesDuplicateIdInput.json new file mode 100644 index 000000000000..06b9ea32a66d --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesDuplicateIdInput.json @@ -0,0 +1,25 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/1.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "a39bf614-9fcb-44c6-9c33-0626ba8411b9", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "10", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "apim-request-id" : "13d25518-5e10-48d1-9e09-9a6fd0054494", + "retry-after" : "0", + "StatusCode" : "400", + "Body" : "{\"error\":{\"code\":\"InvalidRequest\",\"message\":\"Invalid document in request.\",\"innererror\":{\"code\":\"InvalidDocument\",\"message\":\"Request contains duplicated Ids. Make sure each document has a unique Id.\"}}}", + "Date" : "Mon, 27 Jul 2020 16:40:14 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForBatchInput.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForBatchInput.json new file mode 100644 index 000000000000..6dbe299f208d --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForBatchInput.json @@ -0,0 +1,26 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/5.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "37a85b52-818d-4e53-9e74-dca0b1230d9c", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "956", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "csp-billing-usage" : "CognitiveServices.TextAnalytics.BatchScoring=2", + "apim-request-id" : "fd7be3ee-c0aa-4cca-a453-76af868d9cc3", + "retry-after" : "0", + "StatusCode" : "200", + "Body" : "{\"documents\":[{\"id\":\"0\",\"entities\":[{\"text\":\"Microsoft\",\"category\":\"Organization\",\"offset\":0,\"length\":9,\"confidenceScore\":0.38},{\"text\":\"859-98-0987\",\"category\":\"U.S. Social Security Number (SSN)\",\"offset\":28,\"length\":11,\"confidenceScore\":0.65}],\"warnings\":[]},{\"id\":\"1\",\"entities\":[{\"text\":\"111000025\",\"category\":\"Phone Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.8},{\"text\":\"111000025\",\"category\":\"ABA Routing Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.75},{\"text\":\"111000025\",\"category\":\"New Zealand Social Welfare Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.65},{\"text\":\"111000025\",\"category\":\"Portugal Tax Identification Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.65}],\"warnings\":[]}],\"errors\":[],\"modelVersion\":\"2020-07-01\"}", + "Date" : "Fri, 14 Aug 2020 05:37:17 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForBatchInputShowStatistics.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForBatchInputShowStatistics.json new file mode 100644 index 000000000000..a96e447564b8 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForBatchInputShowStatistics.json @@ -0,0 +1,26 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii?showStats=true", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/5.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "6337ce73-6d71-4e9c-a107-7213071eb702", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "1296", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "csp-billing-usage" : "CognitiveServices.TextAnalytics.BatchScoring=2", + "apim-request-id" : "99a0bbd4-1693-43c4-ad88-093108d75d71", + "retry-after" : "0", + "StatusCode" : "200", + "Body" : "{\"statistics\":{\"documentsCount\":2,\"validDocumentsCount\":2,\"erroneousDocumentsCount\":0,\"transactionsCount\":2},\"documents\":[{\"id\":\"0\",\"statistics\":{\"charactersCount\":67,\"transactionsCount\":1},\"entities\":[{\"text\":\"Microsoft\",\"category\":\"Organization\",\"offset\":0,\"length\":9,\"confidenceScore\":0.38},{\"text\":\"859-98-0987\",\"category\":\"U.S. Social Security Number (SSN)\",\"offset\":28,\"length\":11,\"confidenceScore\":0.65}],\"warnings\":[]},{\"id\":\"1\",\"statistics\":{\"charactersCount\":105,\"transactionsCount\":1},\"entities\":[{\"text\":\"111000025\",\"category\":\"Phone Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.8},{\"text\":\"111000025\",\"category\":\"ABA Routing Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.75},{\"text\":\"111000025\",\"category\":\"New Zealand Social Welfare Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.65},{\"text\":\"111000025\",\"category\":\"Portugal Tax Identification Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.65}],\"warnings\":[]}],\"errors\":[],\"modelVersion\":\"2020-07-01\"}", + "Date" : "Fri, 14 Aug 2020 05:33:45 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForEmptyText.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForEmptyText.json new file mode 100644 index 000000000000..89a8da014450 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForEmptyText.json @@ -0,0 +1,25 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/5.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "6b916f89-42c5-41b1-b899-d0ae3b36c4e0", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "2", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "apim-request-id" : "2ccb72c2-02b9-41dd-917f-1dbf546b72bf", + "retry-after" : "0", + "StatusCode" : "200", + "Body" : "{\"documents\":[],\"errors\":[{\"id\":\"0\",\"error\":{\"code\":\"InvalidArgument\",\"message\":\"Invalid document in request.\",\"innererror\":{\"code\":\"InvalidDocument\",\"message\":\"Document text is empty.\"}}}],\"modelVersion\":\"2020-07-01\"}", + "Date" : "Fri, 31 Jul 2020 21:58:01 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForFaultyText.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForFaultyText.json new file mode 100644 index 000000000000..489f880a2e6f --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForFaultyText.json @@ -0,0 +1,26 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/1.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "4088f159-62f5-4465-8823-f03ef1d7eeca", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "74", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "csp-billing-usage" : "CognitiveServices.TextAnalytics.BatchScoring=1", + "apim-request-id" : "21f0e4cc-7e10-4f13-abbb-623664894484", + "retry-after" : "0", + "StatusCode" : "200", + "Body" : "{\"documents\":[{\"id\":\"0\",\"entities\":[],\"warnings\":[]}],\"errors\":[],\"modelVersion\":\"2020-04-01\"}", + "Date" : "Mon, 27 Jul 2020 16:41:53 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForListLanguageHint.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForListLanguageHint.json new file mode 100644 index 000000000000..c7a1b3eb3d8a --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForListLanguageHint.json @@ -0,0 +1,26 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/5.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "ac3b1975-2a1f-4b4c-b812-57f5edc06d90", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "1151", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "csp-billing-usage" : "CognitiveServices.TextAnalytics.BatchScoring=2", + "apim-request-id" : "a5c0e82d-4149-44a6-a86a-f8bcd5ef40d6", + "retry-after" : "0", + "StatusCode" : "200", + "Body" : "{\"documents\":[{\"id\":\"0\",\"entities\":[{\"text\":\"Microsoft\",\"category\":\"Organization\",\"offset\":0,\"length\":9,\"confidenceScore\":0.38},{\"text\":\"859-98-0987\",\"category\":\"U.S. Social Security Number (SSN)\",\"offset\":28,\"length\":11,\"confidenceScore\":0.65}],\"warnings\":[]},{\"id\":\"1\",\"entities\":[{\"text\":\"111000025\",\"category\":\"Phone Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.8},{\"text\":\"111000025\",\"category\":\"ABA Routing Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.75},{\"text\":\"111000025\",\"category\":\"New Zealand Social Welfare Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.65},{\"text\":\"111000025\",\"category\":\"Portugal Tax Identification Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.65}],\"warnings\":[]}],\"errors\":[],\"modelVersion\":\"2020-07-01\"}", + "Date" : "Fri, 14 Aug 2020 05:33:53 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForListStringWithOptions.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForListStringWithOptions.json new file mode 100644 index 000000000000..6ab75b17ec58 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForListStringWithOptions.json @@ -0,0 +1,26 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii?showStats=true", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/5.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "ecba6ca9-c040-4e8a-9a23-a84cf38b2f93", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "1043", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "csp-billing-usage" : "CognitiveServices.TextAnalytics.BatchScoring=2", + "apim-request-id" : "d3e4e6a6-e27d-4bce-8eb2-6b0a41a90d9d", + "retry-after" : "0", + "StatusCode" : "200", + "Body" : "{\"statistics\":{\"documentsCount\":2,\"validDocumentsCount\":2,\"erroneousDocumentsCount\":0,\"transactionsCount\":2},\"documents\":[{\"id\":\"0\",\"statistics\":{\"charactersCount\":67,\"transactionsCount\":1},\"entities\":[{\"text\":\"Microsoft\",\"category\":\"Organization\",\"offset\":0,\"length\":9,\"confidenceScore\":0.38},{\"text\":\"859-98-0987\",\"category\":\"U.S. Social Security Number (SSN)\",\"offset\":28,\"length\":11,\"confidenceScore\":0.65}],\"warnings\":[]},{\"id\":\"1\",\"statistics\":{\"charactersCount\":105,\"transactionsCount\":1},\"entities\":[{\"text\":\"111000025\",\"category\":\"Phone Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.8},{\"text\":\"111000025\",\"category\":\"ABA Routing Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.75},{\"text\":\"111000025\",\"category\":\"New Zealand Social Welfare Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.65},{\"text\":\"111000025\",\"category\":\"Portugal Tax Identification Number\",\"offset\":18,\"length\":9,\"confidenceScore\":0.65}],\"warnings\":[]}],\"errors\":[],\"modelVersion\":\"2020-07-01\"}", + "Date" : "Fri, 14 Aug 2020 05:35:20 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file diff --git a/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForTextInput.json b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForTextInput.json new file mode 100644 index 000000000000..de4663ac6654 --- /dev/null +++ b/sdk/textanalytics/azure-ai-textanalytics/src/test/resources/session-records/recognizePiiEntitiesForTextInput.json @@ -0,0 +1,26 @@ +{ + "networkCallRecords" : [ { + "Method" : "POST", + "Uri" : "https://REDACTED.cognitiveservices.azure.com/text/analytics/v3.1-preview.1//entities/recognition/pii", + "Headers" : { + "User-Agent" : "azsdk-java-azure-ai-textanalytics/1.1.0-beta.1 (11.0.7; Windows 10; 10.0)", + "x-ms-client-request-id" : "e79ad848-316e-4391-be17-1957b972b16f", + "Content-Type" : "application/json" + }, + "Response" : { + "Transfer-Encoding" : "chunked", + "x-envoy-upstream-service-time" : "104", + "Strict-Transport-Security" : "max-age=31536000; includeSubDomains; preload", + "x-content-type-options" : "nosniff", + "csp-billing-usage" : "CognitiveServices.TextAnalytics.BatchScoring=1", + "apim-request-id" : "7fc294cf-e760-4c4d-af87-a28aade608fe", + "retry-after" : "0", + "StatusCode" : "200", + "Body" : "{\"documents\":[{\"id\":\"0\",\"entities\":[{\"text\":\"Microsoft\",\"category\":\"Organization\",\"offset\":0,\"length\":9,\"confidenceScore\":0.4},{\"text\":\"859-98-0987\",\"category\":\"U.S. Social Security Number (SSN)\",\"offset\":28,\"length\":11,\"confidenceScore\":0.65}],\"warnings\":[]}],\"errors\":[],\"modelVersion\":\"2020-04-01\"}", + "Date" : "Mon, 27 Jul 2020 16:42:11 GMT", + "Content-Type" : "application/json; charset=utf-8" + }, + "Exception" : null + } ], + "variables" : [ ] +} \ No newline at end of file