Merge pull request #100611 from HeidiSteen/heidist-gh-kstore

v-albemi · web-flow · commit 80c65bc4dd0b · 2020-01-09T11:44:02.000-08:00
Azure Cognitive Search:  Caching tech review feedback
diff --git a/articles/search/TOC.yml b/articles/search/TOC.yml
@@ -333,7 +333,7 @@
     items:
     - name: API version 2019-05-06-Preview
       href: search-api-preview.md
-    - name: ResetSkills API
+    - name: Reset Skills API
       href: preview-api-resetskills.md
   - name: Resource Manager template
     href: /azure/templates/microsoft.search/searchservices
diff --git a/articles/search/cognitive-search-incremental-indexing-conceptual.md b/articles/search/cognitive-search-incremental-indexing-conceptual.md
@@ -8,7 +8,7 @@ author: Vkurpad
 ms.author: vikurpad
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 01/06/2020
+ms.date: 01/09/2020
 ---
 
 # Introduction to incremental enrichment and caching in Azure Cognitive Search
@@ -23,7 +23,7 @@ Incremental enrichment adds caching and statefulness to an enrichment pipeline,
 
 Incremental enrichment adds a cache to the enrichment pipeline. The indexer caches the results from document cracking plus the outputs of each skill for every document. When a skillset is updated, only the changed, or downstream, skills are rerun. The updated results are written to the cache and the document is updated in the search index or the knowledge store.
 
-Physically, the cache is a blob container in your Azure Storage account. All indexes within a search service may share the same storage account for the indexer cache. Each indexer is assigned a unique and immutable cache identifier to the container it is using.
+Physically, the cache is stored in a blob container in your Azure Storage account. All indexes within a search service may share the same storage account for the indexer cache. Each indexer is assigned a unique and immutable cache identifier to the container it is using.
 
 ## Cache configuration
 
@@ -51,15 +51,22 @@ Setting this property on an existing indexer will require you to reset and rerun
 
 The lifecycle of the cache is managed by the indexer. If the `cache` property on the indexer is set to null or the connection string is changed, the existing cache is deleted on the next indexer run. The cache lifecycle is also tied to the indexer lifecycle. If an indexer is deleted, the associated cache is also deleted.
 
+While incremental enrichment is designed to detect and respond to changes with no intervention on your part, there are parameters you can use to override default behaviors:
+
++ Suspend caching
++ Bypass skillset checks
++ Bypass data source checks
++ Force skillset evaluation
+
 ### Suspend caching
 
 You can temporarily suspend incremental enrichment by setting the `enableReprocessing` property in the cache to `false`, and later resume incremental enrichment and drive eventual consistency by setting it to `true`. This control is particularly useful when you want to prioritize indexing new documents over ensuring consistency across your corpus of documents.
 
-### Override or bypass skillset evaluation
+### Bypass skillset evaluation
 
-Modifying a skillset and reprocessing of that skillset typically go hand in hand. However, some changes to a skillset should not result in reprocessing. For example, updating a URL endpoint of a custom skill, or redeploying a custom skill with a new access key. These are both peripheral modifications that have no genuine impact on the substance of the skillset itself.
+Modifying a skillset and reprocessing of that skillset typically go hand in hand. However, some changes to a skillset should not result in reprocessing (for example, deploying a custom skill to a new location or with a new access key). Most likely, these are peripheral modifications that have no genuine impact on the substance of the skillset itself. 
 
-For changes that fall into this category, you should override skillset evaluation by setting the `disableCacheReprocessingChangeDetection` parameter to `true`:
+If you know that a change to the skillset is indeed superficial, you should override skillset evaluation by setting the `disableCacheReprocessingChangeDetection` parameter to `true`:
 
 1. Call Update Skillset and modify the skillset definition.
 1. Append the `disableCacheReprocessingChangeDetection=true` parameter on the request.
@@ -73,19 +80,19 @@ The following example shows an Update Skillset request with the parameter:
 PUT https://customerdemos.search.windows.net/skillsets/callcenter-text-skillset?api-version=2019-05-06-Preview&disableCacheReprocessingChangeDetection=true
 ```
 
-### Override or bypass data source validation checks
+### Bypass data source validation checks
 
-For scenarios when you know that a change to the data source should not invalidate the cache, such as rotating the key on the storage account, the `ignoreResetRequirement` parameter should be set to `true` on the update operation of the specific resource to ensure that the operation is not rejected.
+Most changes to a data source definition will invalidate the cache. However, for scenarios where you know that a change should not invalidate the cache - such as changing a connection string or rotating the key on the storage account - append the`ignoreResetRequirement` parameter on the data source update. Setting this parameter to `true` allows the commit to go through, without triggering a reset condition that would result in all objects being rebuilt and populated from scratch.
 
 ```http
 PUT https://customerdemos.search.windows.net/datasources/callcenter-ds?api-version=2019-05-06-Preview&ignoreResetRequirement=true
 ```
 
 ### Force skillset evaluation
 
-An opposite scenario is one where you want to deploy a new version of a custom skill; nothing within the enrichment pipeline changes, but you need a specific skill invalidated and all affected documents reprocessed to reflect the benefits of an updated model. 
+The purpose of the cache is to avoid unnecessary processing, but suppose you have made a change to a skill or skillset that the indexer doesn't detect (for example, changes to external components like a custom skillset). 
 
-In such instances, call the invalidate skills operation on the skillset. The [Reset Skills](preview-api-resetskills.md) accepts a POST request with a list of skill outputs in the cache that should be invalidated.
+In this case, you can use the [Reset Skills](preview-api-resetskills.md) API to force reprocessing of a particular skill, including any downstream skills that have a dependency on that skill's output. This API accepts a POST request with a list of skills that should be invalidated and rerun. After Reset Skills, run the indexer to execute the operation.
 
 ## Change detection
 
@@ -126,7 +133,7 @@ Incremental processing evaluates your skillset definition and determines which s
 
 ## API reference content for incremental enrichment
 
-REST `api-version=2019-05-06-Preview` provides the APIs for incremental enrichment, with additions to indexers, skillsets, and data sources. [Official reference documentation](https://docs.microsoft.com/rest/api/searchservice/) is for generally available APIs and does not cover preview features. The following section provides the reference contentfor impacted APIs.
+REST `api-version=2019-05-06-Preview` provides the APIs for incremental enrichment, with additions to indexers, skillsets, and data sources. [Official reference documentation](https://docs.microsoft.com/rest/api/searchservice/) is for generally available APIs and does not cover preview features. The following section provides the reference content for impacted APIs.
 
 Usage information and examples can be found in [Configure caching for incremental enrichment](search-howto-incremental-index.md).
 
diff --git a/articles/search/preview-api-resetskills.md b/articles/search/preview-api-resetskills.md
@@ -8,15 +8,15 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 01/04/2020
+ms.date: 01/09/2020
 ---
 # Reset Skills (api-version=2019-05-06-Preview)
 
 > [!IMPORTANT] 
 > Incremental enrichment is currently in public preview. This preview version is provided without a service level agreement, and it's not recommended for production workloads. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). 
 > The [REST API version 2019-05-06-Preview](search-api-preview.md) provides this feature. There is no portal or .NET SDK support at this time.
 
-The **Reset Skills** request forces reprocessing of a specific skill or the entire skillset. This API is used to override the change detection logic used for incremental enrichment, to force an update of enriched documents in the cache. 
+The **Reset Skills** request specifies which skills to process on the next indexer run. For indexers that have caching enabled, you can explicitly request processing for skill updates that the indexer cannot detect. For example, if you make external changes, such as revisions to a custom skill, you can use this API to rerun the skill. Outputs, such as a knowledge store or search index, are refreshed using reusable data from the cache and new content per the updated skill.
 
 You can reset an existing [skillset](https://docs.microsoft.com/rest/api/searchservice/create-skillset) using an HTTP PUT, specifying the name of the skillset to update on the request URI. You must use **api-version=2019-05-06-Preview** on the request.
 
@@ -33,7 +33,7 @@ POST https://[service name].search.windows.net/indexers/resetskills?api-version=
 api-key: [admin key]  
 ```  
 
-### Request headers  
+## Request headers  
 
  The following table describes the required and optional request headers.  
 
@@ -44,7 +44,7 @@ api-key: [admin key]
 
 You also need the service name to construct the request URL. You can get both the service name and `api-key` from your service Overview page in the Azure portal. See [Create an Azure Cognitive Search service](https://docs.microsoft.com/azure/search/search-create-service-portal) for page navigation help.  
 
-### Request body syntax  
+## Request body syntax  
 
 The body of the request is an array of skill names.
 
diff --git a/articles/search/search-howto-incremental-index.md b/articles/search/search-howto-incremental-index.md
@@ -118,7 +118,7 @@ api-key: [YOUR-ADMIN-KEY]
 After the indexer runs, you can find the cache in Azure blob storage. The container name is in the following format: `ms-az-search-indexercache-<YOUR-CACHE-ID>`
 
 > [!NOTE]
-> A reset and re-run of the indexer results in a full rebuild so that content can be cached. All cognitive enrichments will be re-run on all documents.
+> A reset and rerun of the indexer results in a full rebuild so that content can be cached. All cognitive enrichments will be rerun on all documents.
 >
 
 ### Step 6: Modify a skillset and confirm incremental enrichment
@@ -149,15 +149,26 @@ To set up incremental enrichment for a new indexer, all you have to do is includ
 }
 ```
 
-## Cache management
+## Checking for cached output
 
-When configured, incremental enrichment tracks changes across your indexing pipeline and drives documents to eventual consistency across your index and projections. 
+The cache is created, used, and managed by the indexer, and its contents are not represented in a format that is human readable. The best way to determine whether the cache is used is by running the indexer and compare before-and-after metrics for execution time and document counts. 
 
-While incremental enrichment comes with its own change detection logic for knowing when and what to reprocess, there are situations where you will want to override default behaviors:
+For example, assume a skillset that starts with image analysis and Optical Character Recognition (OCR) of scanned documents, followed by downstream analysis of the resulting text. If you modify a downstream text skill, the indexer can retrieve all of the previously processed image and OCR content from cache, updating and processing just text-related changes indicated by your edits. You can expect to see fewer documents in the document count (for example 8/8 as opposed to 14/14 in the original run), shorter execution times, and fewer charges on your bill.
 
-+ Force a re-evaluation of skillset. You might make internal changes to a custom skillset that the indexer cannot detect. In this case, you can use the [Reset Skills](preview-api-resetskills.md) API to force reprocessing of a particular skill, including any downstream skills that have a dependency on that skill's output.
+## Working with the cache
 
-+ Bypass re-evaluation when making peripheral changes. In some cases, the change detection logic picks up on changes that shouldn't result in reprocessing. For example, changing the endpoint of a custom skill is an update to the `uri` property in a skillset definition, but it shouldn't result in reprocessing of content if the skill itself didn't change. Similarly, updates to a data source connection string, or changing a key, are inline edits that change a data source definition, but shouldn't invalidate the cache. You can set parameters on a request to suppress the change detection logic, while allowing a commit on a definition to go through. For more information about these scenarios, see [Cache management](cognitive-search-incremental-indexing-conceptual.md#cache-management).
+Once the cache is operational, indexers check the cache whenever [Run Indexer](https://docs.microsoft.com/rest/api/searchservice/run-indexer) is called, to see which parts of the existing output can be used. 
+
+The following table summarizes how various APIs relate to the cache:
+
+| API           | Cache impact     |
+|---------------|------------------|
+| [Create Indexer](https://docs.microsoft.com/rest/api/searchservice/create-indexer) | Creates and runs an indexer on first use, including creating a cache if your indexer definition specifies it. |
+| [Run Indexer](https://docs.microsoft.com/rest/api/searchservice/run-indexer) | Executes an enrichment pipeline on demand. This API reads from the cache if it exists, or creates a cache if you added caching to an updated indexer definition. When you run an indexer that has caching enabled, the indexer omits steps if cached output can be used. |
+| [Reset Indexer](https://docs.microsoft.com/rest/api/searchservice/reset-indexer)| Clears the indexer of any incremental indexing information. The next indexer run (either on-demand or schedule) is full reprocessing from scratch, including re-running all skills and rebuilding the cache. It is functionally equivalent to deleting the indexer and recreating it. |
+| [Reset Skills](preview-api-resetskills.md) | Specifies which skills to rerun on the next indexer run, even if you haven't modified any skills. The cache is updated accordingly. Outputs, such as a knowledge store or search index, are refreshed using reusable data from the cache plus new content per the updated skill. |
+
+For more information about controlling what happens to the cache, see [Cache management](cognitive-search-incremental-indexing-conceptual.md#cache-management).
 
 ## Next steps
 
@@ -167,4 +178,4 @@ Additionally, once you enable the cache, you will want to know about the paramet
 
 + [Skillset concepts and composition](cognitive-search-working-with-skillsets.md)
 + [How to create a skillset](cognitive-search-defining-skillset.md)
-+ [Introduction to incremental enrichment and caching](cognitive-search-incremental-indexing-conceptual.md)
++ [Introduction to incremental enrichment and caching](cognitive-search-incremental-indexing-conceptual.md)