|
| 1 | +--- |
| 2 | +title: Incrementally index your documents with change tracking across the entire indexing pipeline - Azure Search |
| 3 | +description: Learn how to index Azure Blob Storage and extract text from documents with Azure Search. |
| 4 | + |
| 5 | +ms.date: 10/07/2019 |
| 6 | +author: vkurpad |
| 7 | +manager: eladz |
| 8 | +ms.author: vikurpad |
| 9 | + |
| 10 | +services: search |
| 11 | +ms.service: search |
| 12 | +ms.devlang: rest-api |
| 13 | +ms.topic: conceptual |
| 14 | +ms.custom: seonov2019 |
| 15 | +--- |
| 16 | + |
| 17 | +# Incrementally Indexing Documents with Azure Search |
| 18 | +This article shows how to use Azure Search to incrementally index documents from any of the supported data sources. |
| 19 | +If you are not familiar with setting up indexers in Azure search start with [indexer overview](search-indexer-overview.md). If you need to learn more about the incremental indexing feature start with [incremental indexing](cognitive-search-incremental-indexing-conceptual.md) |
| 20 | +First, it explains the basics of setting up and configuring incremental indexing. Then, it offers a deeper exploration of behaviors and scenarios you are likely to encounter. |
| 21 | + |
| 22 | +## Setting up incremental indexing |
| 23 | +Incremental indexing can be configured using: |
| 24 | + |
| 25 | +* Azure Search [REST API](https://docs.microsoft.com/rest/api/searchservice/Indexer-operations) |
| 26 | + |
| 27 | +> [!NOTE] |
| 28 | +> this feature is not yet available in the portal, and has to be used programmatically. |
| 29 | +> |
| 30 | +
|
| 31 | +Here, we demonstrate the flow to adding incremental indexing to an existing indexer |
| 32 | + |
| 33 | +### Step 1: Get the indexer |
| 34 | + |
| 35 | +Using a API client construct a GET request to get the current configuration of the indexer you want to add incremental indexing to. |
| 36 | + |
| 37 | + GET https://[service name].search.windows.net/indexers/[your indexer name]?api-version=2019-05-06-preview |
| 38 | + Content-Type: application/json |
| 39 | + api-key: [admin key] |
| 40 | + |
| 41 | +### Step 2: Update the indexer definition |
| 42 | + |
| 43 | +Edit the response from the GET request to add the `cache` property to the indexer. The cache object requires only a single property, `storageConnectionString` which is the connection string to the storage account. |
| 44 | + |
| 45 | +```json |
| 46 | + "cache": { |
| 47 | + "storageConnectionString": "[your storage connection string]" |
| 48 | + } |
| 49 | +``` |
| 50 | + |
| 51 | +### Step 3: Reset the indexer |
| 52 | + |
| 53 | +> [!NOTE] |
| 54 | +> This will result in all documents in your datasource being processed again. All cognitive enrichments will be re-run on all documents. |
| 55 | +> |
| 56 | +
|
| 57 | +A reset of the indexer is required when setting up incremental indexing for existing indexers to ensure that all documents are in a consistent state. Reset the indexer, via the portal or using the REST API |
| 58 | +To create a data source: |
| 59 | + |
| 60 | + POST https://[service name].search.windows.net/indexers/[your indexer name]/reset?api-version=2019-05-06-preview |
| 61 | + Content-Type: application/json |
| 62 | + api-key: [admin key] |
| 63 | + |
| 64 | +### Step 4: Update the indexer |
| 65 | + |
| 66 | +Update the indexer definition with a PUT request, the body of the request should contain the updated indexer definition. |
| 67 | + |
| 68 | + PUT https://[service name].search.windows.net/indexers/[your indexer name]/reset?api-version=2019-05-06-preview |
| 69 | + Content-Type: application/json |
| 70 | + api-key: [admin key] |
| 71 | + { |
| 72 | + "name" : "your indexer name", |
| 73 | + ... |
| 74 | + "cache": { |
| 75 | + "storageConnectionString": "[your storage connection string]", |
| 76 | + "enableReprocessing": true |
| 77 | + } |
| 78 | + } |
| 79 | + |
| 80 | +If you now issue another GET request on the indexer, the response from the service will include a `cacheId` property in the cache object. This is the container name that will contain all the cached results and intermediate state of each document processed by this indexer. |
| 81 | + |
| 82 | +### Set up incremental indexing for a new indexer |
| 83 | + |
| 84 | +Setting up incremental indexing for a new indexer, simply include the cache property in the indexer definition payload. Ensure you are using the `2019-05-06-preview` version of the API. |
| 85 | + |
| 86 | +## Overriding incremental indexing |
| 87 | + |
| 88 | +When configured, incremental indexing tracks changes across your indexing pipeline and drives documents to eventual consistency across your index and projections. In some cases you will need to override this behavior to ensure that the indexer does not perform additional work as a result of an update to the indexing pipeline. For example, updating the datasource connection string will require an indexer reset and reindexing of all documents as the datasource has changed. But if you were only updating the connection string with a new key, you do not want the change to result in any updates to existing documents. Conversely, you may want the indexer to invalidate the cache and enrich documents even if no changes to the indexing pipeline are made, for instance, if you redeploy a custom skill with a new model and want the skill rerun on all your documents. |
| 89 | + |
| 90 | +### Override reset requirement |
| 91 | + |
| 92 | +When making changes to the indexing pipeline, any changes resulting in an invalidation of the cache requires a indexer reset. If you are making a change to the indexer pipeline and do not want the change tracking to invalidate the cache, you will need to set the `ignoreResetRequirement` querystring parameter to true for operations on the indexer or datasource. |
| 93 | + |
| 94 | +### Override change detection |
| 95 | + |
| 96 | +When making updates to the skillset that would result in documents being flagged as incosistent, for example updating a custom skill URL when the skill is redeployed, set the `disableCacheReprocessingChangeDetection` query string parameter to true on skillset updates. |
| 97 | + |
| 98 | +### Force change detection |
| 99 | + |
| 100 | +Instanced when you want the indexing pipeline to recognize a change to an external entity, like deploying a new version of a custom skill, you will need to update the skillset and "touch" the specific skill by editing the skill definition, specifically the URL to force change detection and invalidate the cache for that skill. |
0 commit comments