Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Release History
## 5.1.0-beta.1 (Unreleased)
**New features**
- Added the support for Personally Identifiable Information(PII) entity recognition feature.

## 5.0.0 (2020-07-27)
- Re-release of version `1.0.1` with updated version `5.0.0`.
Expand Down
20 changes: 19 additions & 1 deletion sdk/textanalytics/azure-ai-textanalytics/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Azure Text Analytics client library for Java
Text Analytics is a cloud-based service that provides advanced natural language processing over raw text,
Text Analytics is a cloud-based service provides advanced natural language processing over raw text,
and includes six main functions:

- Sentiment Analysis
- Language Detection
- Key Phrase Extraction
- Named Entity Recognition
- Personally Identifiable Information Entity Recognition
- Linked Entity Recognition

[Source code][source_code] | [Package (Maven)][package] | [API reference documentation][api_reference_doc] | [Product Documentation][product_documentation] | [Samples][samples_readme]
Expand Down Expand Up @@ -186,6 +187,7 @@ The following sections provide several code snippets covering some of the most c
* [Detect Language](#detect-language "Detect language")
* [Extract Key Phrases](#extract-key-phrases "Extract key phrases")
* [Recognize Entities](#recognize-entities "Recognize entities")
* [Recognize Personally Identifiable Information Entities](#recognize-pii-entities "Recognize personally identifiable information entities")
* [Recognize Linked Entities](#recognize-linked-entities "Recognize linked entities")

### Text Analytics Client
Expand Down Expand Up @@ -262,6 +264,21 @@ textAnalyticsClient.recognizeEntities(document).forEach(entity ->
For samples on using the production recommended option `RecognizeEntitiesBatch` see [here][recognize_entities_sample].
Please refer to the service documentation for a conceptual discussion of [named entity recognition][named_entity_recognition].

### Recognize personally identifiable information entities
Run a predictive model to identify a collection of personally identifiable information entities in the passed-in
document or batch of documents and categorize those entities into categories such as person, location, or
organization. For more information on available categories, see [Text Analytics Named Entity Categories][named_entities_categories].

<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L158-L161 -->
```java
String document = "My SSN is 555-55-5555";
textAnalyticsClient.recognizePiiEntities(document).forEach(piiEntity ->
System.out.printf("Recognized Personally Identifiable Information entity: %s, category: %s, subCategory: %s, score: %f.%n",
piiEntity.getText(), piiEntity.getCategory(), piiEntity.getSubcategory(), piiEntity.getConfidenceScore()));
```
For samples on using the production recommended option `RecognizePiiEntitiesBatch` see [here][recognize_pii_entities_sample].
Please refer to the service documentation for a conceptual discussion of [PII entity recognition][named_entity_recognition].

### Recognize linked entities
Run a predictive model to identify a collection of entities found in the passed-in document or batch of documents,
and include information linking the entities to their corresponding entries in a well-known knowledge base.
Expand Down Expand Up @@ -377,6 +394,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For m
[analyze_sentiment_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/AnalyzeSentimentBatchDocuments.java
[extract_key_phrases_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/ExtractKeyPhrasesBatchDocuments.java
[recognize_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizeEntitiesBatchDocuments.java
[recognize_pii_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocuments.java
[recognize_linked_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizeLinkedEntitiesBatchDocuments.java

![Impressions](https://azure-sdk-impressions.azurewebsites.net/api/impressions/azure-sdk-for-java%2Fsdk%2Ftextanalytics%2Fazure-ai-textanalytics%2FREADME.png)
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
import com.azure.ai.textanalytics.models.CategorizedEntityCollection;
import com.azure.ai.textanalytics.models.EntityCategory;
import com.azure.ai.textanalytics.models.RecognizeEntitiesResult;
import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection;
import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions;
import com.azure.ai.textanalytics.models.TextAnalyticsWarning;
import com.azure.ai.textanalytics.models.TextDocumentInput;
import com.azure.ai.textanalytics.models.WarningCode;
import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection;
import com.azure.core.exception.HttpResponseException;
import com.azure.core.http.rest.Response;
import com.azure.core.http.rest.SimpleResponse;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.ai.textanalytics;

import com.azure.ai.textanalytics.implementation.TextAnalyticsClientImpl;
import com.azure.ai.textanalytics.implementation.models.EntitiesResult;
import com.azure.ai.textanalytics.implementation.models.MultiLanguageBatchInput;
import com.azure.ai.textanalytics.implementation.models.WarningCodeValue;
import com.azure.ai.textanalytics.models.EntityCategory;
import com.azure.ai.textanalytics.models.PiiEntity;
import com.azure.ai.textanalytics.models.PiiEntityCollection;
import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult;
import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions;
import com.azure.ai.textanalytics.models.TextAnalyticsWarning;
import com.azure.ai.textanalytics.models.TextDocumentInput;
import com.azure.ai.textanalytics.models.WarningCode;
import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection;
import com.azure.core.exception.HttpResponseException;
import com.azure.core.http.rest.Response;
import com.azure.core.http.rest.SimpleResponse;
import com.azure.core.util.Context;
import com.azure.core.util.IterableStream;
import com.azure.core.util.logging.ClientLogger;
import reactor.core.publisher.Mono;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Objects;
import java.util.stream.Collectors;

import static com.azure.ai.textanalytics.TextAnalyticsAsyncClient.COGNITIVE_TRACING_NAMESPACE_VALUE;
import static com.azure.ai.textanalytics.implementation.Utility.getEmptyErrorIdHttpResponse;
import static com.azure.ai.textanalytics.implementation.Utility.inputDocumentsValidation;
import static com.azure.ai.textanalytics.implementation.Utility.mapToHttpResponseExceptionIfExist;
import static com.azure.ai.textanalytics.implementation.Utility.toBatchStatistics;
import static com.azure.ai.textanalytics.implementation.Utility.toMultiLanguageInput;
import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsError;
import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsException;
import static com.azure.ai.textanalytics.implementation.Utility.toTextDocumentStatistics;
import static com.azure.core.util.FluxUtil.monoError;
import static com.azure.core.util.FluxUtil.withContext;
import static com.azure.core.util.tracing.Tracer.AZ_TRACING_NAMESPACE_KEY;

/**
* Helper class for managing recognize personally identifiable information entity endpoint.
*/
class RecognizePiiEntityAsyncClient {
private final ClientLogger logger = new ClientLogger(RecognizePiiEntityAsyncClient.class);
private final TextAnalyticsClientImpl service;

/**
* Create a {@link RecognizePiiEntityAsyncClient} that sends requests to the Text Analytics services's
* recognize personally identifiable information entity endpoint.
*
* @param service The proxy service used to perform REST calls.
*/
RecognizePiiEntityAsyncClient(TextAnalyticsClientImpl service) {
this.service = service;
}

/**
* Helper function for calling service with max overloaded parameters that returns a {@link Mono}
* which contains {@link PiiEntityCollection}.
*
* @param document A single document.
* @param language The language code.
*
* @return The {@link Mono} of {@link PiiEntityCollection}.
*/
Mono<PiiEntityCollection> recognizePiiEntities(String document, String language) {
try {
Objects.requireNonNull(document, "'document' cannot be null.");
final TextDocumentInput textDocumentInput = new TextDocumentInput("0", document).setLanguage(language);
return recognizePiiEntitiesBatch(Collections.singletonList(textDocumentInput), null)
.map(resultCollectionResponse -> {
PiiEntityCollection entityCollection = null;
// for each loop will have only one entry inside
for (RecognizePiiEntitiesResult entitiesResult : resultCollectionResponse.getValue()) {
if (entitiesResult.isError()) {
throw logger.logExceptionAsError(toTextAnalyticsException(entitiesResult.getError()));
}
entityCollection = new PiiEntityCollection(entitiesResult.getEntities(),
entitiesResult.getEntities().getWarnings());
}
return entityCollection;
});
} catch (RuntimeException ex) {
return monoError(logger, ex);
}
}

/**
* Helper function for calling service with max overloaded parameters.
*
* @param documents The list of documents to recognize personally identifiable information entities for.
* @param options The {@link TextAnalyticsRequestOptions} request options.
*
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
*/
Mono<Response<RecognizePiiEntitiesResultCollection>> recognizePiiEntitiesBatch(
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options) {
try {
inputDocumentsValidation(documents);
return withContext(context -> getRecognizePiiEntitiesResponse(documents, options, context));
} catch (RuntimeException ex) {
return monoError(logger, ex);
}
}

/**
* Helper function for calling service with max overloaded parameters with {@link Context} is given.
*
* @param documents The list of documents to recognize personally identifiable information entities for.
* @param options The {@link TextAnalyticsRequestOptions} request options.
* @param context Additional context that is passed through the Http pipeline during the service call.
*
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
*/
Mono<Response<RecognizePiiEntitiesResultCollection>> recognizePiiEntitiesBatchWithContext(
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options, Context context) {
try {
inputDocumentsValidation(documents);
return getRecognizePiiEntitiesResponse(documents, options, context);
} catch (RuntimeException ex) {
return monoError(logger, ex);
}
}

/**
* Helper method to convert the service response of {@link EntitiesResult} to {@link Response} which contains
* {@link RecognizePiiEntitiesResultCollection}.
*
* @param response the {@link Response} of {@link EntitiesResult} returned by the service.
*
* @return A {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
*/
private Response<RecognizePiiEntitiesResultCollection> toRecognizePiiEntitiesResultCollectionResponse(
final Response<EntitiesResult> response) {
final EntitiesResult entitiesResult = response.getValue();
// List of documents results
final List<RecognizePiiEntitiesResult> recognizeEntitiesResults = new ArrayList<>();
entitiesResult.getDocuments().forEach(documentEntities ->
recognizeEntitiesResults.add(new RecognizePiiEntitiesResult(
documentEntities.getId(),
documentEntities.getStatistics() == null ? null
: toTextDocumentStatistics(documentEntities.getStatistics()),
null,
new PiiEntityCollection(
new IterableStream<>(documentEntities.getEntities().stream().map(entity ->
new PiiEntity(entity.getText(), EntityCategory.fromString(entity.getCategory()),
entity.getSubcategory(), entity.getOffset(), entity.getLength(),
entity.getConfidenceScore()))
.collect(Collectors.toList())),
new IterableStream<>(documentEntities.getWarnings().stream()
.map(warning -> {
final WarningCodeValue warningCodeValue = warning.getCode();
return new TextAnalyticsWarning(
WarningCode.fromString(warningCodeValue == null ? null : warningCodeValue.toString()),
warning.getMessage());
}).collect(Collectors.toList())))
)));
// Document errors
entitiesResult.getErrors().forEach(documentError -> {
/*
* TODO: Remove this after service update to throw exception.
* Currently, service sets max limit of document size to 5, if the input documents size > 5, it will
* have an id = "", empty id. In the future, they will remove this and throw HttpResponseException.
*/
if (documentError.getId().isEmpty()) {
throw logger.logExceptionAsError(
new HttpResponseException(documentError.getError().getInnererror().getMessage(),
getEmptyErrorIdHttpResponse(new SimpleResponse<>(response, response.getValue())),
documentError.getError().getInnererror().getCode()));
}

recognizeEntitiesResults.add(
new RecognizePiiEntitiesResult(documentError.getId(), null,
toTextAnalyticsError(documentError.getError()), null));
});

return new SimpleResponse<>(response,
new RecognizePiiEntitiesResultCollection(recognizeEntitiesResults, entitiesResult.getModelVersion(),
entitiesResult.getStatistics() == null ? null : toBatchStatistics(entitiesResult.getStatistics())));
}

/**
* Call the service with REST response, convert to a {@link Mono} of {@link Response} that contains
* {@link RecognizePiiEntitiesResultCollection} from a {@link SimpleResponse} of {@link EntitiesResult}.
*
* @param documents The list of documents to recognize personally identifiable information entities for.
* @param options The {@link TextAnalyticsRequestOptions} request options.
* @param context Additional context that is passed through the Http pipeline during the service call.
*
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
*/
private Mono<Response<RecognizePiiEntitiesResultCollection>> getRecognizePiiEntitiesResponse(
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options, Context context) {
return service.entitiesRecognitionPiiWithResponseAsync(
new MultiLanguageBatchInput().setDocuments(toMultiLanguageInput(documents)),
options == null ? null : options.getModelVersion(),
options == null ? null : options.isIncludeStatistics(),
options == null ? null : options.getDomain(),
context.addData(AZ_TRACING_NAMESPACE_KEY, COGNITIVE_TRACING_NAMESPACE_VALUE))
.doOnSubscribe(ignoredValue -> logger.info("A batch of documents - {}", documents.toString()))
.doOnSuccess(response ->
logger.info("Recognized personally identifiable information entities for a batch of documents- {}",
response.getValue()))
.doOnError(error ->
logger.warning("Failed to recognize personally identifiable information entities - {}", error))
.map(this::toRecognizePiiEntitiesResultCollectionResponse)
.onErrorMap(throwable -> mapToHttpResponseExceptionIfExist(throwable));
}
}
Loading