Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
- `length` is the number of characters in the text of these models
- `offset` is the offset of the text from the start of the document

**New features**
- Added support for Personally Identifiable Information(PII) entity recognition feature.
To use this feature, you need to make sure you are using the service's v3.1-preview.1 API.

## 5.0.0 (2020-07-27)
- Re-release of version `1.0.1` with updated version `5.0.0`.

Expand Down
28 changes: 24 additions & 4 deletions sdk/textanalytics/azure-ai-textanalytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ and includes six main functions:
- Language Detection
- Key Phrase Extraction
- Named Entity Recognition
- Personally Identifiable Information Entity Recognition
- Linked Entity Recognition

[Source code][source_code] | [Package (Maven)][package] | [API reference documentation][api_reference_doc] | [Product Documentation][product_documentation] | [Samples][samples_readme]
Expand Down Expand Up @@ -186,6 +187,7 @@ The following sections provide several code snippets covering some of the most c
* [Detect Language](#detect-language "Detect language")
* [Extract Key Phrases](#extract-key-phrases "Extract key phrases")
* [Recognize Entities](#recognize-entities "Recognize entities")
* [Recognize Personally Identifiable Information Entities](#recognize-personally-identifiable-information-entities "Recognize Personally Identifiable Information entities")
* [Recognize Linked Entities](#recognize-linked-entities "Recognize linked entities")

### Text Analytics Client
Expand All @@ -209,7 +211,7 @@ TextAnalyticsAsyncClient textAnalyticsClient = new TextAnalyticsClientBuilder()

### Analyze sentiment
Run a Text Analytics predictive model to identify the positive, negative, neutral or mixed sentiment contained in the
passed-in document or batch of documents.
provided document or batch of documents.

<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L104-L108 -->
```java
Expand All @@ -236,7 +238,7 @@ For samples on using the production recommended option `DetectLanguageBatch` see
Please refer to the service documentation for a conceptual discussion of [language detection][language_detection].

### Extract key phrases
Run a model to identify a collection of significant phrases found in the passed-in document or batch of documents.
Run a model to identify a collection of significant phrases found in the provided document or batch of documents.

<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L149-L151 -->
```java
Expand All @@ -248,7 +250,7 @@ For samples on using the production recommended option `ExtractKeyPhrasesBatch`
Please refer to the service documentation for a conceptual discussion of [key phrase extraction][key_phrase_extraction].

### Recognize entities
Run a predictive model to identify a collection of named entities in the passed-in document or batch of documents and
Run a predictive model to identify a collection of named entities in the provided document or batch of documents and
categorize those entities into categories such as person, location, or organization. For more information on available
categories, see [Text Analytics Named Entity Categories][named_entities_categories].

Expand All @@ -262,8 +264,24 @@ textAnalyticsClient.recognizeEntities(document).forEach(entity ->
For samples on using the production recommended option `RecognizeEntitiesBatch` see [here][recognize_entities_sample].
Please refer to the service documentation for a conceptual discussion of [named entity recognition][named_entity_recognition].

### Recognize Personally Identifiable Information entities
Run a predictive model to identify a collection of Personally Identifiable Information(PII) entities in the provided
document. It recognizes and categorizes PII entities in its input text, such as
Social Security Numbers, bank account information, credit card numbers, and more. This endpoint is only supported for
API versions v3.1-preview.1 and above.

<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L158-L161 -->
```java
String document = "My SSN is 859-98-0987";
textAnalyticsClient.recognizePiiEntities(document).forEach(entity -> System.out.printf(
"Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s, offset: %s, length: %s, confidence score: %f.%n",
entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getOffset(), entity.getLength(), entity.getConfidenceScore()));
```
For samples on using the production recommended option `RecognizePiiEntitiesBatch` see [here][recognize_pii_entities_sample].
Please refer to the service documentation for [supported PII entity types][pii_entity_recognition].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do this section for all the readme snippets if not consider moving this to Next Steps?

Copy link
Contributor Author

@mssfang mssfang Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not getting what you mean here. It should already for all the README snippets. It follows the pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested if this could be moved to Next Steps section.


### Recognize linked entities
Run a predictive model to identify a collection of entities found in the passed-in document or batch of documents,
Run a predictive model to identify a collection of entities found in the provided document or batch of documents,
and include information linking the entities to their corresponding entries in a well-known knowledge base.

<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L135-L142 -->
Expand Down Expand Up @@ -357,6 +375,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For m
[named_entity_recognition]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking
[named_entity_recognition_types]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal
[named_entities_categories]: https://docs.microsoft.com/azure/cognitive-services/Text-Analytics/named-entity-types
[pii_entity_recognition]: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal
[package]: https://mvnrepository.com/artifact/com.azure/azure-ai-textanalytics
[performance_tuning]: https://github.com/Azure/azure-sdk-for-java/wiki/Performance-Tuning
[product_documentation]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview
Expand All @@ -377,6 +396,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For m
[analyze_sentiment_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/AnalyzeSentimentBatchDocuments.java
[extract_key_phrases_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/ExtractKeyPhrasesBatchDocuments.java
[recognize_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizeEntitiesBatchDocuments.java
[recognize_pii_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizePiiEntitiesBatchDocuments.java
[recognize_linked_entities_sample]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/textanalytics/azure-ai-textanalytics/src/samples/java/com/azure/ai/textanalytics/batch/RecognizeLinkedEntitiesBatchDocuments.java

![Impressions](https://azure-sdk-impressions.azurewebsites.net/api/impressions/azure-sdk-for-java%2Fsdk%2Ftextanalytics%2Fazure-ai-textanalytics%2FREADME.png)
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
import com.azure.ai.textanalytics.models.CategorizedEntityCollection;
import com.azure.ai.textanalytics.models.EntityCategory;
import com.azure.ai.textanalytics.models.RecognizeEntitiesResult;
import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection;
import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions;
import com.azure.ai.textanalytics.models.TextAnalyticsWarning;
import com.azure.ai.textanalytics.models.TextDocumentInput;
import com.azure.ai.textanalytics.models.WarningCode;
import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection;
import com.azure.core.exception.HttpResponseException;
import com.azure.core.http.rest.Response;
import com.azure.core.http.rest.SimpleResponse;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.ai.textanalytics;

import com.azure.ai.textanalytics.implementation.TextAnalyticsClientImpl;
import com.azure.ai.textanalytics.implementation.models.EntitiesResult;
import com.azure.ai.textanalytics.implementation.models.MultiLanguageBatchInput;
import com.azure.ai.textanalytics.implementation.models.WarningCodeValue;
import com.azure.ai.textanalytics.models.EntityCategory;
import com.azure.ai.textanalytics.models.PiiEntity;
import com.azure.ai.textanalytics.models.PiiEntityCollection;
import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult;
import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions;
import com.azure.ai.textanalytics.models.TextAnalyticsWarning;
import com.azure.ai.textanalytics.models.TextDocumentInput;
import com.azure.ai.textanalytics.models.WarningCode;
import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection;
import com.azure.core.http.rest.Response;
import com.azure.core.http.rest.SimpleResponse;
import com.azure.core.util.Context;
import com.azure.core.util.IterableStream;
import com.azure.core.util.logging.ClientLogger;
import reactor.core.publisher.Mono;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Objects;
import java.util.stream.Collectors;

import static com.azure.ai.textanalytics.TextAnalyticsAsyncClient.COGNITIVE_TRACING_NAMESPACE_VALUE;
import static com.azure.ai.textanalytics.implementation.Utility.inputDocumentsValidation;
import static com.azure.ai.textanalytics.implementation.Utility.mapToHttpResponseExceptionIfExist;
import static com.azure.ai.textanalytics.implementation.Utility.toBatchStatistics;
import static com.azure.ai.textanalytics.implementation.Utility.toMultiLanguageInput;
import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsError;
import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsException;
import static com.azure.ai.textanalytics.implementation.Utility.toTextDocumentStatistics;
import static com.azure.core.util.FluxUtil.monoError;
import static com.azure.core.util.FluxUtil.withContext;
import static com.azure.core.util.tracing.Tracer.AZ_TRACING_NAMESPACE_KEY;

/**
* Helper class for managing recognize Personally Identifiable Information entity endpoint.
*/
class RecognizePiiEntityAsyncClient {
private final ClientLogger logger = new ClientLogger(RecognizePiiEntityAsyncClient.class);
private final TextAnalyticsClientImpl service;

/**
* Create a {@link RecognizePiiEntityAsyncClient} that sends requests to the Text Analytics services's
* recognize Personally Identifiable Information entity endpoint.
*
* @param service The proxy service used to perform REST calls.
*/
RecognizePiiEntityAsyncClient(TextAnalyticsClientImpl service) {
this.service = service;
}

/**
* Helper function for calling service with max overloaded parameters that returns a {@link Mono}
* which contains {@link PiiEntityCollection}.
*
* @param document A single document.
* @param language The language code.
*
* @return The {@link Mono} of {@link PiiEntityCollection}.
*/
Mono<PiiEntityCollection> recognizePiiEntities(String document, String language) {
try {
Objects.requireNonNull(document, "'document' cannot be null.");
return recognizePiiEntitiesBatch(
Collections.singletonList(new TextDocumentInput("0", document).setLanguage(language)), null)
.map(resultCollectionResponse -> {
PiiEntityCollection entityCollection = null;
// for each loop will have only one entry inside
for (RecognizePiiEntitiesResult entitiesResult : resultCollectionResponse.getValue()) {
if (entitiesResult.isError()) {
throw logger.logExceptionAsError(toTextAnalyticsException(entitiesResult.getError()));
}
entityCollection = new PiiEntityCollection(entitiesResult.getEntities(),
entitiesResult.getEntities().getWarnings());
}
return entityCollection;
});
} catch (RuntimeException ex) {
return monoError(logger, ex);
}
}

/**
* Helper function for calling service with max overloaded parameters.
*
* @param documents The list of documents to recognize Personally Identifiable Information entities for.
* @param options The {@link TextAnalyticsRequestOptions} request options.
*
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
*/
Mono<Response<RecognizePiiEntitiesResultCollection>> recognizePiiEntitiesBatch(
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options) {
try {
inputDocumentsValidation(documents);
return withContext(context -> getRecognizePiiEntitiesResponse(documents, options, context));
} catch (RuntimeException ex) {
return monoError(logger, ex);
}
}

/**
* Helper function for calling service with max overloaded parameters with {@link Context} is given.
*
* @param documents The list of documents to recognize Personally Identifiable Information entities for.
* @param options The {@link TextAnalyticsRequestOptions} request options.
* @param context Additional context that is passed through the Http pipeline during the service call.
*
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
*/
Mono<Response<RecognizePiiEntitiesResultCollection>> recognizePiiEntitiesBatchWithContext(
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options, Context context) {
try {
inputDocumentsValidation(documents);
return getRecognizePiiEntitiesResponse(documents, options, context);
} catch (RuntimeException ex) {
return monoError(logger, ex);
}
}

/**
* Helper method to convert the service response of {@link EntitiesResult} to {@link Response} which contains
* {@link RecognizePiiEntitiesResultCollection}.
*
* @param response the {@link Response} of {@link EntitiesResult} returned by the service.
*
* @return A {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
*/
private Response<RecognizePiiEntitiesResultCollection> toRecognizePiiEntitiesResultCollectionResponse(
final Response<EntitiesResult> response) {
final EntitiesResult entitiesResult = response.getValue();
// List of documents results
final List<RecognizePiiEntitiesResult> recognizeEntitiesResults = new ArrayList<>();
entitiesResult.getDocuments().forEach(documentEntities -> {
// Pii entities list
final List<PiiEntity> piiEntities = documentEntities.getEntities().stream().map(entity ->
new PiiEntity(entity.getText(), EntityCategory.fromString(entity.getCategory()),
entity.getSubcategory(), entity.getConfidenceScore(), entity.getOffset(), entity.getLength()))
.collect(Collectors.toList());
// Warnings
final List<TextAnalyticsWarning> warnings = documentEntities.getWarnings().stream()
.map(warning -> {
final WarningCodeValue warningCodeValue = warning.getCode();
return new TextAnalyticsWarning(
WarningCode.fromString(warningCodeValue == null ? null : warningCodeValue.toString()),
warning.getMessage());
}).collect(Collectors.toList());

recognizeEntitiesResults.add(new RecognizePiiEntitiesResult(
documentEntities.getId(),
documentEntities.getStatistics() == null ? null
: toTextDocumentStatistics(documentEntities.getStatistics()),
null,
new PiiEntityCollection(new IterableStream<>(piiEntities), new IterableStream<>(warnings))
));
});
// Document errors
entitiesResult.getErrors().forEach(documentError -> {
recognizeEntitiesResults.add(
new RecognizePiiEntitiesResult(documentError.getId(), null,
toTextAnalyticsError(documentError.getError()), null));
});

return new SimpleResponse<>(response,
new RecognizePiiEntitiesResultCollection(recognizeEntitiesResults, entitiesResult.getModelVersion(),
entitiesResult.getStatistics() == null ? null : toBatchStatistics(entitiesResult.getStatistics())));
}

/**
* Call the service with REST response, convert to a {@link Mono} of {@link Response} that contains
* {@link RecognizePiiEntitiesResultCollection} from a {@link SimpleResponse} of {@link EntitiesResult}.
*
* @param documents The list of documents to recognize Personally Identifiable Information entities for.
* @param options The {@link TextAnalyticsRequestOptions} request options.
* @param context Additional context that is passed through the Http pipeline during the service call.
*
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
*/
private Mono<Response<RecognizePiiEntitiesResultCollection>> getRecognizePiiEntitiesResponse(
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options, Context context) {
return service.entitiesRecognitionPiiWithResponseAsync(
new MultiLanguageBatchInput().setDocuments(toMultiLanguageInput(documents)),
options == null ? null : options.getModelVersion(),
options == null ? null : options.isIncludeStatistics(),
null,
context.addData(AZ_TRACING_NAMESPACE_KEY, COGNITIVE_TRACING_NAMESPACE_VALUE))
.doOnSubscribe(ignoredValue -> logger.info(
"Start recognizing Personally Identifiable Information entities for a batch of documents."))
.doOnSuccess(response -> logger.info(
"Successfully recognized Personally Identifiable Information entities for a batch of documents."))
.doOnError(error ->
logger.warning("Failed to recognize Personally Identifiable Information entities - {}", error))
.map(this::toRecognizePiiEntitiesResultCollectionResponse)
.onErrorMap(throwable -> mapToHttpResponseExceptionIfExist(throwable));
}
}
Loading