You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-incremental-indexing-conceptual.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,15 +8,15 @@ ms.service: azure-ai-search
8
8
ms.custom:
9
9
- ignite-2023
10
10
ms.topic: conceptual
11
-
ms.date: 01/17/2025
11
+
ms.date: 07/11/2025
12
12
---
13
13
14
14
# Incremental enrichment and caching in Azure AI Search
15
15
16
16
> [!IMPORTANT]
17
17
> This feature is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [preview REST API](/rest/api/searchservice/search-service-api-versions#preview-versions) supports this feature.
18
18
19
-
*Incremental enrichment* refers to the use of cached enrichments during [skillset execution](cognitive-search-working-with-skillsets.md) so that only new and changed skills and documents incur Standard processing charges for API calls to Azure AI services. The cache contains the output from [document cracking](search-indexer-overview.md#document-cracking), plus the outputs of each skill for every document. Although caching is billable (it uses Azure Storage), the overall cost of enrichment is reduced because the costs of storage are less than image extraction and AI processing.
19
+
*Incremental enrichment* refers to the use of cached enrichments during [skillset execution](cognitive-search-working-with-skillsets.md) so that only new and changed skills and documents incur standard processing charges for API calls to Azure AI services. The cache contains the output from [document cracking](search-indexer-overview.md#document-cracking), plus the outputs of each skill for every document. Although caching is billable (it uses Azure Storage), the overall cost of enrichment is reduced because the costs of storage are less than image extraction and AI processing.
20
20
21
21
To ensure synchronization between your data source data and your index, it's important to understand your unique [data source](search-data-sources-gallery.md) change and deletion tracking prerequisites. This guide specifically addresses how to manage incremental modifications in terms of your skills processing and how to utilize cache for this purpose.
22
22
@@ -37,10 +37,10 @@ The cache is created when you specify the "cache" property and run the indexer.
37
37
38
38
The following example illustrates an indexer with caching enabled. See [Enable enrichment caching](search-howto-incremental-index.md) for full instructions.
39
39
40
-
To use the cache property, you can use 2020-06-30-preview or later when you [create or update an indexer](/rest/api/searchservice/indexers/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true). We recommend the latest preview API.
40
+
To use the cache property, you can use 2020-06-30-preview or later when you [create or update an indexer](/rest/api/searchservice/indexers/create-or-update?view=rest-searchservice-2025-05-01-preview&preserve-view=true). We recommend the latest preview API.
41
41
42
42
```json
43
-
POST https://[YOUR-SEARCH-SERVICE-NAME].search.windows.net/indexers?api-version=2024-05-01-preview
43
+
POST https://[YOUR-SEARCH-SERVICE-NAME].search.windows.net/indexers?api-version=2025-05-01-preview
44
44
{
45
45
"name": "myIndexerName",
46
46
"targetIndexName": "myIndex",
@@ -90,7 +90,7 @@ If you know that a change to the skill is indeed superficial, you should overrid
90
90
Setting this parameter ensures that only updates to the skillset definition are committed and the change isn't evaluated for effects on the existing cache. Use a preview API version, 2020-06-30-Preview or later. We recommend the latest preview API.
91
91
92
92
```http
93
-
PUT https://[servicename].search.windows.net/skillsets/[skillset name]?api-version=2024-05-01-preview&disableCacheReprocessingChangeDetection
93
+
PUT https://[servicename].search.windows.net/skillsets/[skillset name]?api-version=2025-05-01-preview&disableCacheReprocessingChangeDetection
94
94
95
95
```
96
96
@@ -101,7 +101,7 @@ PUT https://[servicename].search.windows.net/skillsets/[skillset name]?api-versi
101
101
Most changes to a data source definition will invalidate the cache. However, for scenarios where you know that a change shouldn't invalidate the cache - such as changing a connection string or rotating the key on the storage account - append the `ignoreResetRequirement` parameter on the [data source update](/rest/api/searchservice/data-sources/create-or-update). Setting this parameter to true allows the commit to go through, without triggering a reset condition that would result in all objects being rebuilt and populated from scratch.
102
102
103
103
```http
104
-
PUT https://[search service].search.windows.net/datasources/[data source name]?api-version=2024-05-01-preview&ignoreResetRequirement
104
+
PUT https://[search service].search.windows.net/datasources/[data source name]?api-version=2025-05-01-preview&ignoreResetRequirement
105
105
106
106
```
107
107
@@ -111,13 +111,13 @@ PUT https://[search service].search.windows.net/datasources/[data source name]?a
111
111
112
112
The purpose of the cache is to avoid unnecessary processing, but suppose you make a change to a skill that the indexer doesn't detect (for example, changing something in external code, such as a custom skill).
113
113
114
-
In this case, you can use the [Reset Skills](/rest/api/searchservice/skillsets/reset-skills?view=rest-searchservice-2024-05-01-preview&preserve-view=true) to force reprocessing of a particular skill, including any downstream skills that have a dependency on that skill's output. This API accepts a POST request with a list of skills that should be invalidated and marked for reprocessing. After Reset Skills, follow with a [Run Indexer](/rest/api/searchservice/indexers/run) request to invoke the pipeline processing.
114
+
In this case, you can use the [Reset Skills](/rest/api/searchservice/skillsets/reset-skills?view=rest-searchservice-2025-05-01-preview&preserve-view=true) to force reprocessing of a particular skill, including any downstream skills that have a dependency on that skill's output. This API accepts a POST request with a list of skills that should be invalidated and marked for reprocessing. After Reset Skills, follow with a [Run Indexer](/rest/api/searchservice/indexers/run) request to invoke the pipeline processing.
115
115
116
116
## Re-cache specific documents
117
117
118
118
[Resetting an indexer](/rest/api/searchservice/indexers/reset) will result in all documents in the search corpus being reprocessed.
119
119
120
-
In scenarios where only a few documents need to be reprocessed, use [Reset Documents (preview)](/rest/api/searchservice/indexers/reset-docs?view=rest-searchservice-2024-05-01-preview&preserve-view=true) to force reprocessing of specific documents. When a document is reset, the indexer invalidates the cache for that document, which is then reprocessed by reading it from the data source. For more information, see [Run or reset indexers, skills, and documents](search-howto-run-reset-indexers.md).
120
+
In scenarios where only a few documents need to be reprocessed, use [Reset Documents (preview)](/rest/api/searchservice/indexers/reset-docs?view=rest-searchservice-2025-05-01-preview&preserve-view=true) to force reprocessing of specific documents. When a document is reset, the indexer invalidates the cache for that document, which is then reprocessed by reading it from the data source. For more information, see [Run or reset indexers, skills, and documents](search-howto-run-reset-indexers.md).
121
121
122
122
To reset specific documents, the request provides a list of document keys as read from the search index. If the key is mapped to a field in the external data source, the value that you provide should be the one used in the search index.
123
123
@@ -132,7 +132,7 @@ Depending on how you call the API, the request will either append, overwrite, or
132
132
The following example illustrates a reset document request:
133
133
134
134
```http
135
-
POST https://[search service name].search.windows.net/indexers/[indexer name]/resetdocs?api-version=2024-05-01-preview
135
+
POST https://[search service name].search.windows.net/indexers/[indexer name]/resetdocs?api-version=2025-05-01-preview
136
136
{
137
137
"documentKeys" : [
138
138
"key1",
@@ -183,13 +183,13 @@ REST API version `2020-06-30-Preview` or later provides incremental enrichment t
183
183
184
184
Skillsets and data sources can use the generally available version. In addition to the reference documentation, see [Configure caching for incremental enrichment](search-howto-incremental-index.md) for details about order of operations.
185
185
186
-
+[Create or Update Indexer (api-version=2024-05-01-preview)](/rest/api/searchservice/indexers/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true)
186
+
+[Create or Update Indexer (api-version=2025-05-01-preview)](/rest/api/searchservice/indexers/create-or-update?view=rest-searchservice-2025-05-01-preview&preserve-view=true)
+[Create or Update Skillset (api-version=2024-07-01)](/rest/api/searchservice/skillsets/create-or-update) (New URI parameter on the request)
190
+
+[Create or Update Skillset (api-version=2025-05-01-preview)](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2025-05-01-preview&preserve-view=true) (New URI parameter on the request)
191
191
192
-
+[Create or Update Data Source (api-version=2024-07-01)](/rest/api/searchservice/data-sources/create-or-update), when called with a preview API version, provides a new parameter named "ignoreResetRequirement", which should be set to true when your update action shouldn't invalidate the cache. Use "ignoreResetRequirement" sparingly as it could lead to unintended inconsistency in your data that won't be detected easily.
192
+
+[Create or Update Data Source (api-version=2025-05-01-preview)](/rest/api/searchservice/data-sources/create-or-update?view=rest-searchservice-2025-05-01-preview&preserve-view=true), when called with a preview API version, provides a new parameter named "ignoreResetRequirement", which should be set to true when your update action shouldn't invalidate the cache. Use "ignoreResetRequirement" sparingly as it could lead to unintended inconsistency in your data that won't be detected easily.
@@ -55,15 +55,15 @@ Once content is extracted, the [skillset](cognitive-search-working-with-skillset
55
55
56
56
### Download files
57
57
58
-
Download a zip file of the sample data repository and extract the contents. [Learn how](https://docs.github.com/get-started/start-your-journey/downloading-files-from-github).
58
+
+Download a zip file of the sample data repository and extract the contents. [Learn how](https://docs.github.com/get-started/start-your-journey/downloading-files-from-github).
59
59
60
-
+[Sample data files (mixed media)](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/ai-enrichment-mixed-media)
60
+
+Download the [sample data files (mixed media)](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/ai-enrichment-mixed-media)
61
61
62
62
### Upload sample data to Azure Storage
63
63
64
-
1. In Azure Storage, create a new container and name it *cog-search-demo*.
64
+
1. In Azure Storage, [create a new container](/azure/storage/blobs/storage-quickstart-blobs-portal) and name it *mixed-content-types*.
65
65
66
-
1.[Upload the sample data files](/azure/storage/blobs/storage-quickstart-blobs-portal).
66
+
1. Upload the sample data files.
67
67
68
68
1. Get a storage connection string so that you can formulate a connection in Azure AI Search.
69
69
@@ -87,11 +87,9 @@ For this tutorial, connections to Azure AI Search require an endpoint and an API
87
87
88
88
1. Under **Settings** > **Keys**, copy an admin key. Admin keys are used to add, modify, and delete objects. There are two interchangeable admin keys. Copy either one.
89
89
90
-
:::image type="content" source="media/search-get-started-rest/get-url-key.png" alt-text="Screenshot of the URL and API keys in the Azure portal.":::
91
-
92
90
## Set up your environment
93
91
94
-
Begin by opening Visual Studio and creating a new Console App project that can run on .NET Core.
92
+
Begin by opening Visual Studio and creating a new Console App project.
95
93
96
94
### Install Azure.Search.Documents
97
95
@@ -173,7 +171,7 @@ public static void Main(string[] args)
173
171
> The clients connect to your search service. In order to avoid opening too many connections, you should try to share a single instance in your application if possible. The methods are thread-safe to enable such sharing.
174
172
>
175
173
176
-
### Add function to exit the program during failure
174
+
### Add a function to exit the program during failure
177
175
178
176
This tutorial is meant to help you understand each step of the indexing pipeline. If there's a critical issue that prevents the program from creating the data source, skillset, index, or indexer the program will output the error message and exit so that the issue can be understood and addressed.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-working-with-skillsets.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: HeidiSteen
6
6
ms.author: heidist
7
7
ms.service: azure-ai-search
8
8
ms.topic: conceptual
9
-
ms.date: 01/15/2025
9
+
ms.date: 07/11/2025
10
10
---
11
11
12
12
# Skillset concepts in Azure AI Search
@@ -19,9 +19,9 @@ The following diagram illustrates the basic data flow of skillset execution.
19
19
20
20
:::image type="content" source="media/cognitive-search-working-with-skillsets/skillset-process-diagram-1.png" alt-text="Diagram showing skillset data flows, with focus on inputs, outputs, and mappings." border="true":::
21
21
22
-
From the onset of skillset processing to its conclusion, skills read from and write to an [*enriched document*](#enrichment-tree) that exists in memory. Initially, an enriched document is just the raw content extracted from a data source (articulated as the `"/document"` root node). With each skill execution, the enriched document gains structure and substance as each skill writes its output as nodes in the graph.
22
+
From the onset of skillset processing to its conclusion, skills read from and write to an [*enriched document tree](#enrichment-tree) that exists in memory. Initially, an enriched document is just the raw content extracted from a data source (articulated as the `"/document"` root node). With each skill execution, the enriched document gains structure and substance as each skill writes its output as nodes in the graph.
23
23
24
-
After skillset execution is done, the output of an enriched document finds its way into an index through user-defined *output field mappings*. Any raw content that you want transferred intact, from source to an index, is defined through *field mappings*. In contrast, *output field mappings* transfer in-memory content (nodes) to the index.
24
+
After skillset execution is done, the output of an enriched document is routed to an index through user-defined *output field mappings*. Any raw content that you want transferred intact, from source to an index, is defined through *field mappings*. In contrast, *output field mappings* transfer in-memory content (nodes) to the index.
25
25
26
26
To configure applied AI, specify settings in a skillset and indexer.
27
27
@@ -58,11 +58,11 @@ A context determines:
58
58
59
59
### Skill dependencies
60
60
61
-
Skills can execute independently and in parallel, or sequentially if you feed the output of one skill into another skill. The following example demonstrates two [built-in skills](cognitive-search-predefined-skills.md) that execute in sequence:
61
+
Skills can execute independently and in parallel, or sequentially in a dependent relationship if you feed the output of one skill into another skill. The following example demonstrates two [built-in skills](cognitive-search-predefined-skills.md) that execute in sequence:
62
62
63
63
+ Skill #1 is a [Text Split skill](cognitive-search-skill-textsplit.md) that accepts the contents of the "reviews_text" source field as input, and splits that content into "pages" of 5,000 characters as output. Splitting large text into smaller chunks can produce better outcomes for skills like sentiment detection.
64
64
65
-
+ Skill #2 is a [Sentiment Detection skill](cognitive-search-skill-sentiment.md) accepts "pages" as input, and produces a new field called "Sentiment" as output that contains the results of sentiment analysis.
65
+
+ Skill #2 is a [Sentiment Detection skill](cognitive-search-skill-sentiment.md)depends on the split skill output. It accepts "pages" as input, and produces a new field called "Sentiment" as output that contains the results of sentiment analysis.
66
66
67
67
Notice how the output of the first skill ("pages") is used in sentiment analysis, where "/document/reviews_text/pages/*" is both the context and input. For more information about path formulation, see [How to reference enrichments](cognitive-search-concept-annotations-syntax.md).
68
68
@@ -124,7 +124,7 @@ Notice how the output of the first skill ("pages") is used in sentiment analysis
124
124
125
125
## Enrichment tree
126
126
127
-
An enriched document is a temporary, tree-like data structure created during skillset execution that collects all of the changes introduced through skills. Collectively, enrichments are represented as a hierarchy of addressable nodes. Nodes also include any unenriched fields that are passed in verbatim from the external data source.
127
+
An enriched document is a temporary, tree-like data structure created during skillset execution that collects all of the changes introduced through skills. Collectively, enrichments are represented as a hierarchy of addressable nodes. Nodes also include any unenriched fields that are passed in verbatim from the external data source. The best approach for examining the structure and content of an enrichment tree is through a [debug session](cognitive-search-debug-session.md) in the Azure portal.
128
128
129
129
An enriched document exists for the duration of skillset execution, but can be [cached](cognitive-search-incremental-indexing-conceptual.md) or sent to a [knowledge store](knowledge-store-concept-intro.md).
0 commit comments