You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-incremental-index.md
+27-17Lines changed: 27 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.service: cognitive-search
8
8
ms.custom:
9
9
- ignite-2023
10
10
ms.topic: how-to
11
-
ms.date: 02/29/2024
11
+
ms.date: 03/04/2024
12
12
---
13
13
14
14
# Enable caching for incremental enrichment in Azure AI Search
@@ -20,22 +20,31 @@ This article explains how to add caching to an enrichment pipeline so that you c
20
20
21
21
Cached content is placed in Azure Storage using account information that you provide. The container, named `ms-az-search-indexercache-<alpha-numerc-string>`, is created when you run the indexer. It should be considered an internal component managed by your search service and must not be modified.
22
22
23
-
If you're not familiar with setting up indexers, start with [indexer overview](search-indexer-overview.md) and then continue on to [skillsets](cognitive-search-working-with-skillsets.md) to learn about enrichment pipelines. For more background on key concepts, see [incremental enrichment](cognitive-search-incremental-indexing-conceptual.md).
24
-
25
23
## Prerequisites
26
24
27
-
Azure Storage is used to store cached enrichments. The storage account must be [general purpose v2](../storage/common/storage-account-overview.md#types-of-storage-accounts).
28
-
29
-
Preview APIs or beta Azure SDKs are required for enabling cache on an indexer. The portal does not currently provide an option for caching enrichment.
25
+
+[Azure Storage](/azure/storage/common/storage-account-create) for storing cached enrichments. The storage account must be [general purpose v2](../storage/common/storage-account-overview.md#types-of-storage-accounts).
30
26
31
-
If you are planning to [index blobs](search-howto-indexing-azure-blob-storage.md), and you require the documents to be removed from both your cache and index when they are deleted from your data source, it is necessary to enable a [deletion policy](search-howto-index-changed-deleted-blobs.md)within the indexer. Without this policy, the deletion from the cache is not supported.
27
+
+[For blob indexing only](search-howto-indexing-azure-blob-storage.md), if you need synchronized document removal from both the cache and index when blobs are deleted from your data source, enable a [deletion policy](search-howto-index-changed-deleted-blobs.md)in the indexer. Without this policy, document deletion from the cache isn't supported.
32
28
29
+
You should be familiar with setting up indexers. Start with [indexer overview](search-indexer-overview.md) and then continue on to [skillsets](cognitive-search-working-with-skillsets.md) to learn about enrichment pipelines. For more background on key concepts, see [incremental enrichment](cognitive-search-incremental-indexing-conceptual.md).
33
30
34
31
> [!CAUTION]
35
32
> If you're using the [SharePoint Online indexer (Preview)](search-howto-index-sharepoint-online.md), you should avoid incremental enrichment. Under certain circumstances, the cache becomes invalid, requiring an [indexer reset and run](search-howto-run-reset-indexers.md), should you choose to reload it.
36
33
37
34
## Enable on new indexers
38
35
36
+
You can use the Azure portal, preview APIs, or beta Azure SDKs are required to enable an enrichment cache on an indexer.
37
+
38
+
### [**Azure portal**](#tab/portal)
39
+
40
+
1. On the left, select **Indexers**, and then select **Add indexer**.
41
+
1. Provide an indexer name and an existing index, data source, and skillset.
42
+
1. Enable incremental caching and set the Azure Storage account.
43
+
44
+
:::image type="content" source="media/search-incremental-index/portal-option.png" alt-text="Screenshot of the portal option for enrichment cache.":::
45
+
46
+
### [**REST**](#tab/rest)
47
+
39
48
On new indexers, add the "cache" property in the indexer definition payload when calling [Create or Update Indexer (2021-04-30-Preview)](/rest/api/searchservice/preview-api/create-or-update-indexer). You can also use the previous preview API version, 2020-06-30-Preview.
40
49
41
50
```https
@@ -49,13 +58,14 @@ POST https://[service name].search.windows.net/indexers?api-version=2021-04-30-P
For existing indexers that already have a skillset, use the following steps to add caching. As a one-time operation, reset and rerun the indexer in full to load the cache.
@@ -122,7 +132,7 @@ PUT https://[YOUR-SEARCH-SERVICE].search.windows.net/indexers/[YOUR-INDEXER-NAME
122
132
}
123
133
```
124
134
125
-
If you now issue another GET request on the indexer, the response from the service will include an `ID` property in the cache object. The alphanumeric string is appended to the name of the container containing all the cached results and intermediate state of each document processed by this indexer. The ID will be used to uniquely name the cache in Blob storage.
135
+
If you now issue another GET request on the indexer, the response from the service includes an `ID` property in the cache object. The alphanumeric string is appended to the name of the container containing all the cached results and intermediate state of each document processed by this indexer. The ID is used to uniquely name the cache in Blob storage.
126
136
127
137
```http
128
138
"cache": {
@@ -134,7 +144,7 @@ If you now issue another GET request on the indexer, the response from the servi
134
144
135
145
### Step 5: Run the indexer
136
146
137
-
To run indexer, you can use the portal or the API. In the portal, from the indexers list, select the indexer and click**Run**. One advantage to using the portal is that you can monitor indexer status, note the duration of the job, and how many documents are processed. Portal pages are refreshed every few minutes.
147
+
To run indexer, you can use the portal or the API. In the portal, from the indexers list, select the indexer and select**Run**. One advantage to using the portal is that you can monitor indexer status, note the duration of the job, and how many documents are processed. Portal pages are refreshed every few minutes.
138
148
139
149
Alternatively, you can use REST to [run the indexer](/rest/api/searchservice/run-indexer):
140
150
@@ -150,13 +160,13 @@ api-key: [YOUR-ADMIN-KEY]
150
160
151
161
## Check for cached output
152
162
153
-
Find the cache in Azure Storage, under Blob container. The container name will be`ms-az-search-indexercache-<some-alphanumeric-string>`.
163
+
Find the cache in Azure Storage, under Blob container. The container name is`ms-az-search-indexercache-<some-alphanumeric-string>`.
154
164
155
-
A cache is created and used by an indexer. Its content is not human readable.
165
+
A cache is created and used by an indexer. Its content isn't human readable.
156
166
157
167
To verify whether the cache is operational, modify a skillset and run the indexer, then compare before-and-after metrics for execution time and document counts.
158
168
159
-
Skillsets that include image analysis and Optical Character Recognition (OCR) of scanned documents make good test cases. If you modify a downstream text skill or any skill that is not image-related, the indexer can retrieve all of the previously processed image and OCR content from cache, updating and processing only the text-related changes indicated by your edits. You can expect to see fewer documents in the indexer execution document count, shorter execution times, and fewer charges on your bill.
169
+
Skillsets that include image analysis and Optical Character Recognition (OCR) of scanned documents make good test cases. If you modify a downstream text skill or any skill that isn't image-related, the indexer can retrieve all of the previously processed image and OCR content from cache, updating and processing only the text-related changes indicated by your edits. You can expect to see fewer documents in the indexer execution document count, shorter execution times, and fewer charges on your bill.
160
170
161
171
The [file set](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/ai-enrichment-mixed-media) used in [cog-search-demo tutorials](cognitive-search-tutorial-blob.md) is a useful test case because it contains 14 files of various formats JPG, PNG, HTML, DOCX, PPTX, and other types. Change `en` to `es` or another language in the text translation skill for proof-of-concept testing of incremental enrichment.
162
172
@@ -166,7 +176,7 @@ The following error occurs if you forget to specify a preview API version on the
166
176
167
177
`"The request is invalid. Details: indexer : A resource without a type name was found, but no expected type was specified. To allow entries without type information, the expected type must also be specified when the model is specified."`
168
178
169
-
A 400 Bad Request error will also occur if you are missing an indexer requirement. The error message will specify any missing dependencies.
179
+
A 400 Bad Request error will also occur if you're missing an indexer requirement. The error message specifies any missing dependencies.
0 commit comments