Skip to content

Commit 80c65bc

Browse files
authored
Merge pull request #100611 from HeidiSteen/heidist-gh-kstore
Azure Cognitive Search: Caching tech review feedback
2 parents 3da3d53 + 88b2da7 commit 80c65bc

File tree

4 files changed

+40
-22
lines changed

4 files changed

+40
-22
lines changed

articles/search/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@
333333
items:
334334
- name: API version 2019-05-06-Preview
335335
href: search-api-preview.md
336-
- name: ResetSkills API
336+
- name: Reset Skills API
337337
href: preview-api-resetskills.md
338338
- name: Resource Manager template
339339
href: /azure/templates/microsoft.search/searchservices

articles/search/cognitive-search-incremental-indexing-conceptual.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: Vkurpad
88
ms.author: vikurpad
99
ms.service: cognitive-search
1010
ms.topic: conceptual
11-
ms.date: 01/06/2020
11+
ms.date: 01/09/2020
1212
---
1313

1414
# Introduction to incremental enrichment and caching in Azure Cognitive Search
@@ -23,7 +23,7 @@ Incremental enrichment adds caching and statefulness to an enrichment pipeline,
2323

2424
Incremental enrichment adds a cache to the enrichment pipeline. The indexer caches the results from document cracking plus the outputs of each skill for every document. When a skillset is updated, only the changed, or downstream, skills are rerun. The updated results are written to the cache and the document is updated in the search index or the knowledge store.
2525

26-
Physically, the cache is a blob container in your Azure Storage account. All indexes within a search service may share the same storage account for the indexer cache. Each indexer is assigned a unique and immutable cache identifier to the container it is using.
26+
Physically, the cache is stored in a blob container in your Azure Storage account. All indexes within a search service may share the same storage account for the indexer cache. Each indexer is assigned a unique and immutable cache identifier to the container it is using.
2727

2828
## Cache configuration
2929

@@ -51,15 +51,22 @@ Setting this property on an existing indexer will require you to reset and rerun
5151

5252
The lifecycle of the cache is managed by the indexer. If the `cache` property on the indexer is set to null or the connection string is changed, the existing cache is deleted on the next indexer run. The cache lifecycle is also tied to the indexer lifecycle. If an indexer is deleted, the associated cache is also deleted.
5353

54+
While incremental enrichment is designed to detect and respond to changes with no intervention on your part, there are parameters you can use to override default behaviors:
55+
56+
+ Suspend caching
57+
+ Bypass skillset checks
58+
+ Bypass data source checks
59+
+ Force skillset evaluation
60+
5461
### Suspend caching
5562

5663
You can temporarily suspend incremental enrichment by setting the `enableReprocessing` property in the cache to `false`, and later resume incremental enrichment and drive eventual consistency by setting it to `true`. This control is particularly useful when you want to prioritize indexing new documents over ensuring consistency across your corpus of documents.
5764

58-
### Override or bypass skillset evaluation
65+
### Bypass skillset evaluation
5966

60-
Modifying a skillset and reprocessing of that skillset typically go hand in hand. However, some changes to a skillset should not result in reprocessing. For example, updating a URL endpoint of a custom skill, or redeploying a custom skill with a new access key. These are both peripheral modifications that have no genuine impact on the substance of the skillset itself.
67+
Modifying a skillset and reprocessing of that skillset typically go hand in hand. However, some changes to a skillset should not result in reprocessing (for example, deploying a custom skill to a new location or with a new access key). Most likely, these are peripheral modifications that have no genuine impact on the substance of the skillset itself.
6168

62-
For changes that fall into this category, you should override skillset evaluation by setting the `disableCacheReprocessingChangeDetection` parameter to `true`:
69+
If you know that a change to the skillset is indeed superficial, you should override skillset evaluation by setting the `disableCacheReprocessingChangeDetection` parameter to `true`:
6370

6471
1. Call Update Skillset and modify the skillset definition.
6572
1. Append the `disableCacheReprocessingChangeDetection=true` parameter on the request.
@@ -73,19 +80,19 @@ The following example shows an Update Skillset request with the parameter:
7380
PUT https://customerdemos.search.windows.net/skillsets/callcenter-text-skillset?api-version=2019-05-06-Preview&disableCacheReprocessingChangeDetection=true
7481
```
7582

76-
### Override or bypass data source validation checks
83+
### Bypass data source validation checks
7784

78-
For scenarios when you know that a change to the data source should not invalidate the cache, such as rotating the key on the storage account, the `ignoreResetRequirement` parameter should be set to `true` on the update operation of the specific resource to ensure that the operation is not rejected.
85+
Most changes to a data source definition will invalidate the cache. However, for scenarios where you know that a change should not invalidate the cache - such as changing a connection string or rotating the key on the storage account - append the`ignoreResetRequirement` parameter on the data source update. Setting this parameter to `true` allows the commit to go through, without triggering a reset condition that would result in all objects being rebuilt and populated from scratch.
7986

8087
```http
8188
PUT https://customerdemos.search.windows.net/datasources/callcenter-ds?api-version=2019-05-06-Preview&ignoreResetRequirement=true
8289
```
8390

8491
### Force skillset evaluation
8592

86-
An opposite scenario is one where you want to deploy a new version of a custom skill; nothing within the enrichment pipeline changes, but you need a specific skill invalidated and all affected documents reprocessed to reflect the benefits of an updated model.
93+
The purpose of the cache is to avoid unnecessary processing, but suppose you have made a change to a skill or skillset that the indexer doesn't detect (for example, changes to external components like a custom skillset).
8794

88-
In such instances, call the invalidate skills operation on the skillset. The [Reset Skills](preview-api-resetskills.md) accepts a POST request with a list of skill outputs in the cache that should be invalidated.
95+
In this case, you can use the [Reset Skills](preview-api-resetskills.md) API to force reprocessing of a particular skill, including any downstream skills that have a dependency on that skill's output. This API accepts a POST request with a list of skills that should be invalidated and rerun. After Reset Skills, run the indexer to execute the operation.
8996

9097
## Change detection
9198

@@ -126,7 +133,7 @@ Incremental processing evaluates your skillset definition and determines which s
126133

127134
## API reference content for incremental enrichment
128135

129-
REST `api-version=2019-05-06-Preview` provides the APIs for incremental enrichment, with additions to indexers, skillsets, and data sources. [Official reference documentation](https://docs.microsoft.com/rest/api/searchservice/) is for generally available APIs and does not cover preview features. The following section provides the reference contentfor impacted APIs.
136+
REST `api-version=2019-05-06-Preview` provides the APIs for incremental enrichment, with additions to indexers, skillsets, and data sources. [Official reference documentation](https://docs.microsoft.com/rest/api/searchservice/) is for generally available APIs and does not cover preview features. The following section provides the reference content for impacted APIs.
130137

131138
Usage information and examples can be found in [Configure caching for incremental enrichment](search-howto-incremental-index.md).
132139

articles/search/preview-api-resetskills.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: conceptual
11-
ms.date: 01/04/2020
11+
ms.date: 01/09/2020
1212
---
1313
# Reset Skills (api-version=2019-05-06-Preview)
1414

1515
> [!IMPORTANT]
1616
> Incremental enrichment is currently in public preview. This preview version is provided without a service level agreement, and it's not recommended for production workloads. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
1717
> The [REST API version 2019-05-06-Preview](search-api-preview.md) provides this feature. There is no portal or .NET SDK support at this time.
1818
19-
The **Reset Skills** request forces reprocessing of a specific skill or the entire skillset. This API is used to override the change detection logic used for incremental enrichment, to force an update of enriched documents in the cache.
19+
The **Reset Skills** request specifies which skills to process on the next indexer run. For indexers that have caching enabled, you can explicitly request processing for skill updates that the indexer cannot detect. For example, if you make external changes, such as revisions to a custom skill, you can use this API to rerun the skill. Outputs, such as a knowledge store or search index, are refreshed using reusable data from the cache and new content per the updated skill.
2020

2121
You can reset an existing [skillset](https://docs.microsoft.com/rest/api/searchservice/create-skillset) using an HTTP PUT, specifying the name of the skillset to update on the request URI. You must use **api-version=2019-05-06-Preview** on the request.
2222

@@ -33,7 +33,7 @@ POST https://[service name].search.windows.net/indexers/resetskills?api-version=
3333
api-key: [admin key]
3434
```
3535

36-
### Request headers
36+
## Request headers
3737

3838
The following table describes the required and optional request headers.
3939

@@ -44,7 +44,7 @@ api-key: [admin key]
4444

4545
You also need the service name to construct the request URL. You can get both the service name and `api-key` from your service Overview page in the Azure portal. See [Create an Azure Cognitive Search service](https://docs.microsoft.com/azure/search/search-create-service-portal) for page navigation help.
4646

47-
### Request body syntax
47+
## Request body syntax
4848

4949
The body of the request is an array of skill names.
5050

articles/search/search-howto-incremental-index.md

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ api-key: [YOUR-ADMIN-KEY]
118118
After the indexer runs, you can find the cache in Azure blob storage. The container name is in the following format: `ms-az-search-indexercache-<YOUR-CACHE-ID>`
119119

120120
> [!NOTE]
121-
> A reset and re-run of the indexer results in a full rebuild so that content can be cached. All cognitive enrichments will be re-run on all documents.
121+
> A reset and rerun of the indexer results in a full rebuild so that content can be cached. All cognitive enrichments will be rerun on all documents.
122122
>
123123
124124
### Step 6: Modify a skillset and confirm incremental enrichment
@@ -149,15 +149,26 @@ To set up incremental enrichment for a new indexer, all you have to do is includ
149149
}
150150
```
151151

152-
## Cache management
152+
## Checking for cached output
153153

154-
When configured, incremental enrichment tracks changes across your indexing pipeline and drives documents to eventual consistency across your index and projections.
154+
The cache is created, used, and managed by the indexer, and its contents are not represented in a format that is human readable. The best way to determine whether the cache is used is by running the indexer and compare before-and-after metrics for execution time and document counts.
155155

156-
While incremental enrichment comes with its own change detection logic for knowing when and what to reprocess, there are situations where you will want to override default behaviors:
156+
For example, assume a skillset that starts with image analysis and Optical Character Recognition (OCR) of scanned documents, followed by downstream analysis of the resulting text. If you modify a downstream text skill, the indexer can retrieve all of the previously processed image and OCR content from cache, updating and processing just text-related changes indicated by your edits. You can expect to see fewer documents in the document count (for example 8/8 as opposed to 14/14 in the original run), shorter execution times, and fewer charges on your bill.
157157

158-
+ Force a re-evaluation of skillset. You might make internal changes to a custom skillset that the indexer cannot detect. In this case, you can use the [Reset Skills](preview-api-resetskills.md) API to force reprocessing of a particular skill, including any downstream skills that have a dependency on that skill's output.
158+
## Working with the cache
159159

160-
+ Bypass re-evaluation when making peripheral changes. In some cases, the change detection logic picks up on changes that shouldn't result in reprocessing. For example, changing the endpoint of a custom skill is an update to the `uri` property in a skillset definition, but it shouldn't result in reprocessing of content if the skill itself didn't change. Similarly, updates to a data source connection string, or changing a key, are inline edits that change a data source definition, but shouldn't invalidate the cache. You can set parameters on a request to suppress the change detection logic, while allowing a commit on a definition to go through. For more information about these scenarios, see [Cache management](cognitive-search-incremental-indexing-conceptual.md#cache-management).
160+
Once the cache is operational, indexers check the cache whenever [Run Indexer](https://docs.microsoft.com/rest/api/searchservice/run-indexer) is called, to see which parts of the existing output can be used.
161+
162+
The following table summarizes how various APIs relate to the cache:
163+
164+
| API | Cache impact |
165+
|---------------|------------------|
166+
| [Create Indexer](https://docs.microsoft.com/rest/api/searchservice/create-indexer) | Creates and runs an indexer on first use, including creating a cache if your indexer definition specifies it. |
167+
| [Run Indexer](https://docs.microsoft.com/rest/api/searchservice/run-indexer) | Executes an enrichment pipeline on demand. This API reads from the cache if it exists, or creates a cache if you added caching to an updated indexer definition. When you run an indexer that has caching enabled, the indexer omits steps if cached output can be used. |
168+
| [Reset Indexer](https://docs.microsoft.com/rest/api/searchservice/reset-indexer)| Clears the indexer of any incremental indexing information. The next indexer run (either on-demand or schedule) is full reprocessing from scratch, including re-running all skills and rebuilding the cache. It is functionally equivalent to deleting the indexer and recreating it. |
169+
| [Reset Skills](preview-api-resetskills.md) | Specifies which skills to rerun on the next indexer run, even if you haven't modified any skills. The cache is updated accordingly. Outputs, such as a knowledge store or search index, are refreshed using reusable data from the cache plus new content per the updated skill. |
170+
171+
For more information about controlling what happens to the cache, see [Cache management](cognitive-search-incremental-indexing-conceptual.md#cache-management).
161172

162173
## Next steps
163174

@@ -167,4 +178,4 @@ Additionally, once you enable the cache, you will want to know about the paramet
167178

168179
+ [Skillset concepts and composition](cognitive-search-working-with-skillsets.md)
169180
+ [How to create a skillset](cognitive-search-defining-skillset.md)
170-
+ [Introduction to incremental enrichment and caching](cognitive-search-incremental-indexing-conceptual.md)
181+
+ [Introduction to incremental enrichment and caching](cognitive-search-incremental-indexing-conceptual.md)

0 commit comments

Comments
 (0)