Skip to content

Commit 7f52988

Browse files
authored
Merge pull request #3035 from laujan/375637-native-document-support-summarization-pii
native document support updates
2 parents 4fd371d + 3263117 commit 7f52988

File tree

15 files changed

+1003
-557
lines changed

15 files changed

+1003
-557
lines changed

articles/ai-services/.openpublishing.redirection.ai-services.json

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -866,6 +866,21 @@
866866
"source_path_from_root": "/articles/ai-services/ai-services-and-ecosystem.md",
867867
"redirect_url": "/azure/ai-services/what-are-ai-services",
868868
"redirect_document_id": false
869+
},
870+
{
871+
"source_path_from_root": "/articles/ai-services/language-service/personally-identifiable-information/how-to-call.md",
872+
"redirect_url": "/azure/ai-services/language-service/personally-identifiable-information/how-to/redact-text-pii",
873+
"redirect_document_id": true
874+
},
875+
{
876+
"source_path_from_root": "/articles/ai-services/language-service/personally-identifiable-information/how-to-call-for-conversations.md",
877+
"redirect_url": "/azure/ai-services/language-service/personally-identifiable-information/how-to/redact-conversation-pii",
878+
"redirect_document_id": true
879+
},
880+
{
881+
"source_path_from_root": "/articles/ai-services/language-service/native-document-support/use-native-documents.md",
882+
"redirect_url": "/azure/ai-services/language-service/native-document-support/overview",
883+
"redirect_document_id": true
869884
}
870885
]
871886
}

articles/ai-services/language-service/native-document-support/managed-identities.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ ms.topic: how-to
66
manager: nitinme
77
ms.author: lajanuar
88
author: laujan
9-
ms.date: 11/21/2024
9+
ms.date: 03/05/2025
1010
---
1111

1212

1313
# Managed identities for Language resources
1414

15-
Managed identities for Azure resources are service principals that create a Microsoft Entra identity and specific permissions for Azure managed resources. Managed identities are a safer way to grant access to storage data and replace the requirement for you to include shared access signature tokens (SAS) with your [source and target container URLs](use-native-documents.md#create-azure-blob-storage-containers).
15+
Managed identities for Azure resources are service principals that create a Microsoft Entra identity and specific permissions for Azure managed resources. Managed identities are a safer way to grant access to storage data and replace the requirement for you to include shared access signature tokens (SAS) with your source and target container URLs.
1616

17-
:::image type="content" source="media/managed-identity-flow.png" alt-text="Screenshot of managed identity flow (RBAC).":::
17+
:::image type="content" source="media/managed-identity-flow.png" alt-text="Screenshot of managed identity flow (`RBAC`).":::
1818

1919
* You can use managed identities to grant access to any resource that supports Microsoft Entra authentication, including your own applications.
2020

@@ -24,7 +24,7 @@ Managed identities for Azure resources are service principals that create a Micr
2424

2525
> [!IMPORTANT]
2626
>
27-
> * When using managed identities, don't include a SAS token URL with your HTTP requests—your requests will fail. Using managed identities replaces the requirement for you to include shared access signature tokens (SAS) with your [source and target container URLs](use-native-documents.md#create-azure-blob-storage-containers).
27+
> * When using managed identities, don't include a SAS token URL with your HTTP requests. Using managed identities replaces the requirement for you to include shared access signature tokens (SAS) with your source and target container URLs.
2828
>
2929
> * To use managed identities for Language operations, you must [create your Language resource](https://ms.portal.azure.com/#create/Microsoft.CognitiveServicesTextAnalytics) in a specific geographic Azure region such as **East US**. If your Language resource region is set to **Global**, then you can't use managed identity authentication. You can, however, still use [Shared Access Signature (SAS) tokens](shared-access-signatures.md).
3030
>
@@ -65,7 +65,7 @@ To get started, you need the following resources:
6565

6666
## Managed identity assignments
6767

68-
There are two types of managed identities: **system-assigned** and **user-assigned**. Currently, Document Translation supports **system-assigned managed identity**:
68+
There are two types of managed identities: **system-assigned** and **user-assigned**. Currently, Document Translation supports **system-assigned managed identity**:
6969

7070
* A system-assigned managed identity is **enabled** directly on a service instance. It isn't enabled by default; you must go to your resource and update the identity setting.
7171

@@ -135,4 +135,4 @@ You must grant the Language resource access to your storage account before it ca
135135
## Next steps
136136

137137
> [!div class="nextstepaction"]
138-
> [Get started with native document support](use-native-documents.md#include-native-documents-with-an-http-request)
138+
> [Native document support](overview.md#native-document-support-for-azure-ai-language-preview)
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: Native document support for Azure AI Language (preview)
3+
titleSuffix: Azure AI services
4+
description: How to use native document with Azure AI Languages Personally Identifiable Information and Summarization capabilities.
5+
author: laujan
6+
manager: nitinme
7+
ms.service: azure-ai-language
8+
ms.custom:
9+
- ignite-2024
10+
ms.topic: how-to
11+
ms.date: 02/19/2025
12+
ms.author: lajanuar
13+
---
14+
15+
<!-- markdownlint-disable MD033 -->
16+
<!-- markdownlint-disable MD051 -->
17+
<!-- markdownlint-disable MD024 -->
18+
<!-- markdownlint-disable MD036 -->
19+
<!-- markdownlint-disable MD049 -->
20+
<!-- markdownlint-disable MD001 -->
21+
22+
# Native document support for Azure AI Language (preview)
23+
24+
> [!IMPORTANT]
25+
>
26+
> * Azure AI Language public preview releases provide early access to features that are in active development.
27+
> * Features, approaches, and processes can change, before General Availability (GA), based on user feedback.
28+
29+
Azure AI Language is a cloud-based service that applies Natural Language Processing (NLP) features to text-based data. The native document support capability enables you to send API requests asynchronously, using an HTTP POST request body to send your data and HTTP GET request query string to retrieve the status results. Your processed documents are located in your Azure Blob Storage target container.
30+
31+
A native document refers to the file format used to create the original document such as Microsoft Word (docx) or a portable document file (pdf). Native document support eliminates the need for text preprocessing before using Azure AI Language resource capabilities. Currently, native document support is available for the following capabilities:
32+
33+
* [Personally Identifiable Information (PII)](../personally-identifiable-information/overview.md). The PII detection feature can identify, categorize, and redact sensitive information in unstructured text. The `PiiEntityRecognition` API supports native document processing.
34+
35+
* [Document summarization](../summarization/overview.md). Document summarization uses natural language processing to generate extractive (salient sentence extraction) or abstractive (contextual word extraction) summaries for documents. Both `AbstractiveSummarization` and `ExtractiveSummarization` APIs support native document processing.
36+
37+
## Supported document formats
38+
39+
Applications use native file formats to create, save, or open native documents. Currently **PII** and **Document summarization** capabilities supports the following native document formats:
40+
41+
|File type|File extension|Description|
42+
|---------|--------------|-----------|
43+
|Text| `.txt`|An unformatted text document.|
44+
|Adobe PDF| `.pdf`|A portable document file formatted document.|
45+
|Microsoft Word| `.docx`|A Microsoft Word document file.|
46+
47+
## Input guidelines
48+
49+
***Supported file formats***
50+
51+
|Type|support and limitations|
52+
|---|---|
53+
|**PDFs**| Fully scanned PDFs aren't supported.|
54+
|**Text within images**| Digital images with embedded text aren't supported.|
55+
|**Digital tables**| Tables in scanned documents aren't supported.|
56+
57+
***Document Size***
58+
59+
|Attribute|Input limit|
60+
|---|---|
61+
|**Total number of documents per request** |**≤ 20**|
62+
|**Total content size per request**| **≤ 10 MB**|
63+
64+
## Request headers and parameters
65+
66+
|parameter |Description |
67+
|---------|---------|
68+
|`-X POST <endpoint>` | Specifies your Language resource endpoint for accessing the API. |
69+
|`--header Content-Type: application/json` | The content type for sending JSON data. |
70+
|`--header "Ocp-Apim-Subscription-Key:<key>` | Specifies the Language resource key for accessing the API. |
71+
|`-data` | The JSON file containing the data you want to pass with your request. |
72+
73+
## Related content
74+
75+
> [!div class="nextstepaction"]
76+
> [PII detection overview](../personally-identifiable-information/overview.md "Learn more about Personally Identifiable Information detection.") [Document Summarization overview](../summarization/overview.md "Learn more about automatic document summarization.")

articles/ai-services/language-service/native-document-support/shared-access-signatures.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.topic: how-to
66
manager: nitinme
77
ms.author: lajanuar
88
author: laujan
9-
ms.date: 11/21/2024
9+
ms.date: 03/05/2025
1010
---
1111

1212
# SAS tokens for your storage containers
@@ -19,17 +19,15 @@ Learn to create user delegation, shared access signature (SAS) tokens, using the
1919
>
2020
> [Role-based access control (managed identities)](../concepts/role-based-access-control.md) provide an alternate method for granting access to your storage data without the need to include SAS tokens with your HTTP requests.
2121
>
22-
> * You can use managed identities to grant access to any resource that supports Microsoft Entra authentication, including your own applications.
22+
> * Using managed identities grants access to any resource that supports Microsoft Entra authentication, including your own applications.
2323
> * Using managed identities replaces the requirement for you to include shared access signature tokens (SAS) with your source and target URLs.
24-
> * There's no added cost to use managed identities in Azure.
24+
> * Using managed identities doesn't require an added cost in Azure.
2525
2626
At a high level, here's how SAS tokens work:
2727

2828
* Your application submits the SAS token to Azure Storage as part of a REST API request.
2929

30-
* If the storage service verifies that the SAS is valid, the request is authorized.
31-
32-
* If the SAS token is deemed invalid, the request is declined, and the error code 403 (Forbidden) is returned.
30+
* The storage service verifies that the SAS is valid and then the request is authorized. If the SAS token is deemed invalid, the request is declined, and the error code 403 (Forbidden) is returned.
3331

3432
Azure Blob Storage offers three resource types:
3533

@@ -41,7 +39,7 @@ Azure Blob Storage offers three resource types:
4139
>
4240
> * SAS tokens are used to grant permissions to storage resources, and should be protected in the same manner as an account key.
4341
>
44-
> * Operations that use SAS tokens should be performed only over an HTTPS connection, and SAS URIs should only be distributed on a secure connection such as HTTPS.
42+
> * Operations that use SAS tokens should be performed only over an HTTPS connection, and `SAS URI`s should only be distributed on a secure connection such as HTTPS.
4543
4644
## Prerequisites
4745

@@ -80,9 +78,9 @@ Workflow: **Your storage account** → **containers** → **your container** →
8078
* Consider setting a longer duration period for the time you're using your storage account for Language Service operations.
8179
* The value of the expiry time is determined by whether you're using an **Account key** or **User delegation key** **Signing method**:
8280
* **Account key**: No imposed maximum time limit; however, best practices recommended that you configure an expiration policy to limit the interval and minimize compromise. [Configure an expiration policy for shared access signatures](/azure/storage/common/sas-expiration-policy).
83-
* **User delegation key**: The value for the expiry time is a maximum of seven days from the creation of the SAS token. The SAS is invalid after the user delegation key expires, so a SAS with an expiry time of greater than seven days will still only be valid for seven days. For more information,*see* [Use Microsoft Entra credentials to secure a SAS](/azure/storage/blobs/storage-blob-user-delegation-sas-create-cli#use-azure-ad-credentials-to-secure-a-sas).
81+
* **User delegation key**: The value for the expiry time is a maximum of seven days from the creation of the SAS token. The SAS is invalid after the user delegation key expires, so a SAS with an expiry time of greater than seven days will still only be valid for seven days. For more information, *see* [Use Microsoft Entra credentials to secure a SAS](/azure/storage/blobs/storage-blob-user-delegation-sas-create-cli#use-azure-ad-credentials-to-secure-a-sas).
8482

85-
1. The **Allowed IP addresses** field is optional and specifies an IP address or a range of IP addresses from which to accept requests. If the request IP address doesn't match the IP address or address range specified on the SAS token, authorization fails. The IP address or a range of IP addresses must be public IPs, not private. For more information,*see*, [**Specify an IP address or IP range**](/rest/api/storageservices/create-account-sas#specify-an-ip-address-or-ip-range).
83+
1. The **Allowed IP addresses** field is optional and specifies an IP address or a range of IP addresses from which to accept requests. If the request IP address doesn't match the IP address or address range specified on the SAS token, authorization fails. The IP address or a range of IP addresses must be public IPs, not private. For more information, *see*, [**Specify an IP address or IP range**](/rest/api/storageservices/create-account-sas#specify-an-ip-address-or-ip-range).
8684

8785
1. The **Allowed protocols** field is optional and specifies the protocol permitted for a request made with the SAS. The default value is HTTPS.
8886

@@ -130,5 +128,5 @@ That's it! You learned how to create SAS tokens to authorize how clients access
130128
## Next steps
131129

132130
> [!div class="nextstepaction"]
133-
> [Learn more about native document support](use-native-documents.md "Learn how to process and analyze native documents.") [Learn more about granting access with SAS ](/azure/storage/common/storage-sas-overview "Grant limited access to Azure Storage resources using shared access SAS.")
131+
> [Learn more about native document support](overview.md "Learn how to process and analyze native documents.") [Learn more about granting access with SAS ](/azure/storage/common/storage-sas-overview "Grant limited access to Azure Storage resources using shared access SAS.")
134132
>

0 commit comments

Comments
 (0)