You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-encrypted-blobs.md
+15-17Lines changed: 15 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,14 +11,14 @@ ms.custom:
11
11
- ignite-2023
12
12
ms.service: cognitive-search
13
13
ms.topic: tutorial
14
-
ms.date: 05/23/2024
14
+
ms.date: 09/04/2024
15
15
---
16
16
17
17
# Tutorial: Index and enrich encrypted blobs for full-text search in Azure AI Search
18
18
19
19
This tutorial shows you how to use [Azure AI Search](search-what-is-azure-search.md) to index documents that have been previously encrypted with a customer-managed key in [Azure Blob Storage](/azure/storage/blobs/storage-blobs-introduction).
20
20
21
-
Normally, an indexer can't extract content from blobs that have been encrypted using the [clientside encryption](/azure/storage/blobs/client-side-encryption) of the Azure Blob Storage client library because the indexer doesn't have access to the customer-managed encryption key in [Azure Key Vault](/azure/key-vault/general/overview). However, by leveraging the [DecryptBlobFile custom skill](https://github.com/Azure-Samples/azure-search-power-skills/blob/main/Utils/DecryptBlobFile), followed by the [Document Extraction skill](cognitive-search-skill-document-extraction.md), you can provide controlled access to the key to decrypt the files and then extract content from them. This unlocks the ability to index and enrich these documents without compromising the encryption status of your stored documents.
21
+
Normally, an indexer can't extract content from blobs that have been encrypted using the [client-side encryption](/azure/storage/blobs/client-side-encryption) of the Azure Blob Storage client library because the indexer doesn't have access to the customer-managed encryption key in [Azure Key Vault](/azure/key-vault/general/overview). However, by leveraging the [DecryptBlobFile custom skill](https://github.com/Azure-Samples/azure-search-power-skills/blob/main/Utils/DecryptBlobFile), followed by the [Document Extraction skill](cognitive-search-skill-document-extraction.md), you can provide controlled access to the key to decrypt the files and then extract content from them. This unlocks the ability to index and enrich these documents without compromising the encryption status of your stored documents.
22
22
23
23
Starting with previously encrypted whole documents (unstructured text) such as PDF, HTML, DOCX, and PPTX in Azure Blob Storage, this tutorial uses a REST client and the Search REST APIs to perform the following tasks:
24
24
@@ -34,7 +34,7 @@ If you don't have an Azure subscription, open a [free account](https://azure.mic
34
34
35
35
+[Azure AI Search](search-create-service-portal.md) on any tier or region.
36
36
37
-
+[Azure Storage](https://azure.microsoft.com/services/storage/), Standard performance (general-purpose v2)
37
+
+[Azure Storage](https://azure.microsoft.com/services/storage/), Standard performance (general-purpose v2).
38
38
39
39
+ Blobs encrypted with a customer-managed key. See [Tutorial: Encrypt and decrypt blobs using Azure Key Vault](/azure/storage/blobs/storage-encrypt-decrypt-blobs-key-vault) if you need to create sample data.
40
40
@@ -45,17 +45,15 @@ Custom skill deployment creates an Azure Function app and an Azure Storage accou
45
45
> [!NOTE]
46
46
> Skillsets often require [attaching an Azure AI multi-service resource](cognitive-search-attach-cognitive-services.md). As written, this skillset has no dependency on Azure AI services and thus no key is required. If you later add enrichments that invoke built-in skills, remember to update your skillset accordingly.
47
47
48
-
## 1 - Create services and collect credentials
48
+
## Deploy the custom skill
49
49
50
-
### Deploy the custom skill
50
+
This example uses the sample [DecryptBlobFile](https://github.com/Azure-Samples/azure-search-power-skills/blob/main/Utils/DecryptBlobFile) project from the [Azure Search Power Skills](https://github.com/Azure-Samples/azure-search-power-skills) GitHub repository. In this section, you deploy the skill to an Azure Function so that it can be used in a skillset. A built-in deployment script creates an Azure Function resource with a **psdbf-function-app-** prefix and loads the skill. You are prompted to provide a subscription and resource group. Be sure to choose the same subscription that your Azure Key Vault instance lives in.
51
51
52
-
This example uses the sample [DecryptBlobFile](https://github.com/Azure-Samples/azure-search-power-skills/blob/main/Utils/DecryptBlobFile) project from the [Azure Search Power Skills](https://github.com/Azure-Samples/azure-search-power-skills) GitHub repository. In this section, you will deploy the skill to an Azure Function so that it can be used in a skillset. A built-in deployment script creates an Azure Function resource named starting with **psdbf-function-app-** and loads the skill. You'll be prompted to provide a subscription and resource group. Be sure to choose the same subscription that your Azure Key Vault instance lives in.
53
-
54
-
Operationally, the DecryptBlobFile skill takes the URL and SAS token for each blob as inputs, and it outputs the downloaded, decrypted file using the file reference contract that Azure AI Search expects. Recall that DecryptBlobFile needs the encryption key to perform the decryption. As part of setup, you'll also create an access policy that grants DecryptBlobFile function access to the encryption key in Azure Key Vault.
52
+
Operationally, the DecryptBlobFile skill takes the URL and SAS token for each blob as inputs, and it outputs the downloaded, decrypted file using the file reference contract that Azure AI Search expects. Recall that DecryptBlobFile needs the encryption key to perform the decryption. As part of setup, you also create an access policy that grants DecryptBlobFile function access to the encryption key in Azure Key Vault.
55
53
56
54
1. Click the **Deploy to Azure** button found on the [DecryptBlobFile landing page](https://github.com/Azure-Samples/azure-search-power-skills/blob/main/Utils/DecryptBlobFile#deployment), which will open the provided Resource Manager template within the Azure portal.
57
55
58
-
1. Choose the same subscription where your Azure Key Vault instance exists (this tutorial will not work if you select a different subscription).
56
+
1. Choose the same subscription where your Azure Key Vault instance exists (this tutorial won't work if you select a different subscription).
59
57
60
58
1. Select an existing resource group or create a new one. A dedicated resource group makes cleanup easier later.
61
59
@@ -67,7 +65,7 @@ Operationally, the DecryptBlobFile skill takes the URL and SAS token for each bl
67
65
68
66
You should have an Azure Function app that contains the decryption logic and an Azure Storage resource that will store application data. In the next several steps, you'll give the app permissions to access the key vault and collect information that you'll need for the REST calls.
69
67
70
-
###Grant permissions in Azure Key Vault
68
+
## Grant permissions in Azure Key Vault
71
69
72
70
1. Navigate to your Azure Key Vault service in the portal. [Create an access policy](/azure/key-vault/general/assign-access-policy-portal) in the Azure Key Vault that grants key access to the custom skill.
73
71
@@ -85,7 +83,7 @@ You should have an Azure Function app that contains the decryption logic and an
85
83
86
84
1. On **Review + create**, select **Create**.
87
85
88
-
###Collect app information
86
+
## Collect app information
89
87
90
88
1. Navigate to the **psdbf-function-app** function in the portal, and make a note of the following properties you'll need for the REST calls:
91
89
@@ -97,7 +95,7 @@ You should have an Azure Function app that contains the decryption logic and an
97
95
98
96
:::image type="content" source="media/indexing-encrypted-blob-files/function-host-key.png" alt-text="Screenshot of the App Keys page of the Azure Function app." border="true":::
99
97
100
-
###Get an admin api-key and URL for Azure AI Search
98
+
## Get an admin api-key and URL for Azure AI Search
101
99
102
100
1. Sign in to the [Azure portal](https://portal.azure.com), and in your search service **Overview** page, get the name of your search service. You can confirm your service name by reviewing the endpoint URL. If your endpoint URL were `https://mydemo.search.windows.net`, your service name would be `mydemo`.
103
101
@@ -125,7 +123,7 @@ Create variables for endpoints and keys:
125
123
|`skillset-name`| Leave as **encrypted-blobs-ss**. |
126
124
|`indexer-name`| Leave as **encrypted-blobs-ixr**. |
127
125
128
-
###Review and run each request
126
+
## Review and run each request
129
127
130
128
Use HTTP requests to create the objects of an enrichment pipeline:
131
129
@@ -141,11 +139,11 @@ Use HTTP requests to create the objects of an enrichment pipeline:
141
139
142
140
Indexing and enrichment commence as soon as you submit the Create Indexer request. Depending on how many documents are in your storage account, indexing can take a while. To find out whether the indexer is still running, send a **Get Indexer Status** request and review the response to learn whether the indexer is running, or to view error and warning information.
143
141
144
-
If you are using the Free tier, the following message is expected: `"Could not extract content or metadata from your document. Truncated extracted text to '32768' characters"`. This message appears because blob indexing on the Free tier has a [32K limit on character extraction](search-limits-quotas-capacity.md#indexer-limits). You won't see this message for this data set on higher tiers.
142
+
If you're using the Free tier, the following message is expected: `"Could not extract content or metadata from your document. Truncated extracted text to '32768' characters"`. This message appears because blob indexing on the Free tier has a [32K limit on character extraction](search-limits-quotas-capacity.md#indexer-limits). You won't see this message for this data set on higher tiers.
145
143
146
144
## Search your content
147
145
148
-
After indexer execution is finished, you can run some queries to verify that the data has been successfully decrypted and indexed. Navigate to your Azure AI Search service in the portal, and use the [search explorer](search-explorer.md) to run queries over the indexed data.
146
+
After indexer execution is finished, you can run some queries to verify that the data has been successfully decrypted and indexed. Navigate to your Azure AI Search service in the portal, and use the [Search Explorer](search-explorer.md) to run queries over the indexed data.
149
147
150
148
## Clean up resources
151
149
@@ -155,6 +153,6 @@ You can find and manage resources in the portal, using the All resources or Reso
155
153
156
154
## Next steps
157
155
158
-
Now that you have successfully indexed encrypted files, you can [iterate on this pipeline by adding more cognitive skills](cognitive-search-defining-skillset.md). This will allow you to enrich and gain additional insights to your data.
156
+
Now that you have successfully indexed encrypted files, you can [iterate on this pipeline by adding more skills](cognitive-search-defining-skillset.md). This will allow you to enrich and gain additional insights to your data.
159
157
160
-
If you are working with doubly encrypted data, you might want to investigate the index encryption features available in Azure AI Search. Although the indexer needs decrypted data for indexing purposes, once the index exists, it can be encrypted in a search index using a customer-managed key. This will ensure that your data is always encrypted when at rest. For more information, see [Configure customer-managed keys for data encryption in Azure AI Search](search-security-manage-encryption-keys.md).
158
+
If you're working with doubly encrypted data, you might want to investigate the index encryption features available in Azure AI Search. Although the indexer needs decrypted data for indexing purposes, once the index exists, it can be encrypted in a search index using a customer-managed key. This will ensure that your data is always encrypted when at rest. For more information, see [Configure customer-managed keys for data encryption in Azure AI Search](search-security-manage-encryption-keys.md).
0 commit comments