Skip to content

Commit ae9379a

Browse files
committed
added incremental howto
1 parent 2af6901 commit ae9379a

File tree

1 file changed

+100
-0
lines changed

1 file changed

+100
-0
lines changed
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
---
2+
title: Incrementally index your documents with change tracking across the entire indexing pipeline - Azure Search
3+
description: Learn how to index Azure Blob Storage and extract text from documents with Azure Search.
4+
5+
ms.date: 10/07/2019
6+
author: vkurpad
7+
manager: eladz
8+
ms.author: vikurpad
9+
10+
services: search
11+
ms.service: search
12+
ms.devlang: rest-api
13+
ms.topic: conceptual
14+
ms.custom: seonov2019
15+
---
16+
17+
# Incrementally Indexing Documents with Azure Search
18+
This article shows how to use Azure Search to incrementally index documents from any of the supported data sources.
19+
If you are not familiar with setting up indexers in Azure search start with [indexer overview](search-indexer-overview.md). If you need to learn more about the incremental indexing feature start with [incremental indexing](cognitive-search-incremental-indexing-conceptual.md)
20+
First, it explains the basics of setting up and configuring incremental indexing. Then, it offers a deeper exploration of behaviors and scenarios you are likely to encounter.
21+
22+
## Setting up incremental indexing
23+
Incremental indexing can be configured using:
24+
25+
* Azure Search [REST API](https://docs.microsoft.com/rest/api/searchservice/Indexer-operations)
26+
27+
> [!NOTE]
28+
> this feature is not yet available in the portal, and has to be used programmatically.
29+
>
30+
31+
Here, we demonstrate the flow to adding incremental indexing to an existing indexer
32+
33+
### Step 1: Get the indexer
34+
35+
Using a API client construct a GET request to get the current configuration of the indexer you want to add incremental indexing to.
36+
37+
GET https://[service name].search.windows.net/indexers/[your indexer name]?api-version=2019-05-06-preview
38+
Content-Type: application/json
39+
api-key: [admin key]
40+
41+
### Step 2: Update the indexer definition
42+
43+
Edit the response from the GET request to add the `cache` property to the indexer. The cache object requires only a single property, `storageConnectionString` which is the connection string to the storage account.
44+
45+
```json
46+
"cache": {
47+
"storageConnectionString": "[your storage connection string]"
48+
}
49+
```
50+
51+
### Step 3: Reset the indexer
52+
53+
> [!NOTE]
54+
> This will result in all documents in your datasource being processed again. All cognitive enrichments will be re-run on all documents.
55+
>
56+
57+
A reset of the indexer is required when setting up incremental indexing for existing indexers to ensure that all documents are in a consistent state. Reset the indexer, via the portal or using the REST API
58+
To create a data source:
59+
60+
POST https://[service name].search.windows.net/indexers/[your indexer name]/reset?api-version=2019-05-06-preview
61+
Content-Type: application/json
62+
api-key: [admin key]
63+
64+
### Step 4: Update the indexer
65+
66+
Update the indexer definition with a PUT request, the body of the request should contain the updated indexer definition.
67+
68+
PUT https://[service name].search.windows.net/indexers/[your indexer name]/reset?api-version=2019-05-06-preview
69+
Content-Type: application/json
70+
api-key: [admin key]
71+
{
72+
"name" : "your indexer name",
73+
...
74+
"cache": {
75+
"storageConnectionString": "[your storage connection string]",
76+
"enableReprocessing": true
77+
}
78+
}
79+
80+
If you now issue another GET request on the indexer, the response from the service will include a `cacheId` property in the cache object. This is the container name that will contain all the cached results and intermediate state of each document processed by this indexer.
81+
82+
### Set up incremental indexing for a new indexer
83+
84+
Setting up incremental indexing for a new indexer, simply include the cache property in the indexer definition payload. Ensure you are using the `2019-05-06-preview` version of the API.
85+
86+
## Overriding incremental indexing
87+
88+
When configured, incremental indexing tracks changes across your indexing pipeline and drives documents to eventual consistency across your index and projections. In some cases you will need to override this behavior to ensure that the indexer does not perform additional work as a result of an update to the indexing pipeline. For example, updating the datasource connection string will require an indexer reset and reindexing of all documents as the datasource has changed. But if you were only updating the connection string with a new key, you do not want the change to result in any updates to existing documents. Conversely, you may want the indexer to invalidate the cache and enrich documents even if no changes to the indexing pipeline are made, for instance, if you redeploy a custom skill with a new model and want the skill rerun on all your documents.
89+
90+
### Override reset requirement
91+
92+
When making changes to the indexing pipeline, any changes resulting in an invalidation of the cache requires a indexer reset. If you are making a change to the indexer pipeline and do not want the change tracking to invalidate the cache, you will need to set the `ignoreResetRequirement` querystring parameter to true for operations on the indexer or datasource.
93+
94+
### Override change detection
95+
96+
When making updates to the skillset that would result in documents being flagged as incosistent, for example updating a custom skill URL when the skill is redeployed, set the `disableCacheReprocessingChangeDetection` query string parameter to true on skillset updates.
97+
98+
### Force change detection
99+
100+
Instanced when you want the indexing pipeline to recognize a change to an external entity, like deploying a new version of a custom skill, you will need to update the skillset and "touch" the specific skill by editing the skill definition, specifically the URL to force change detection and invalidate the cache for that skill.

0 commit comments

Comments
 (0)