Skip to content

Commit 5044c40

Browse files
committed
incremental indexing
1 parent da73321 commit 5044c40

4 files changed

+26
-20
lines changed

articles/search/search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@ This article supplements [**Create an indexer**](search-howto-create-indexers.md
2121

2222
+ An [Azure SQL database](../azure-sql/database/sql-database-paas-overview.md) with data in a single table or view. Use a table if you want the ability to [index incremental updates](#CaptureChangedRows) using SQL's native change detection capabilities.
2323

24-
+ Read permissions. Azure Cognitive Search supports SQL Server authentication, where the user name and password are provided on the connection string. Alternatively, you can [set up a managed identity and use Azure roles](search-howto-managed-identities-sql.md) to omit credentials on the connection.
24+
+ Read permissions. Azure Cognitive Search supports SQL Server authentication, where the user name and password are provided on the connection string. Alternatively, you can [set up a managed identity and use Azure roles](search-howto-managed-identities-sql.md) to omit credentials on the connection.
25+
26+
+ A REST client, such as [Postman](search-get-started-rest.md) or [Visual Studio Code with the extension for Azure Cognitive Search](search-get-started-vs-code.md) to send REST calls that create the data source, index, and indexer.
2527

2628
<!-- Real-time data synchronization must not be an application requirement. An indexer can reindex your table at most every five minutes. If your data changes frequently, and those changes need to be reflected in the index within seconds or single minutes, we recommend using the [REST API](/rest/api/searchservice/AddUpdate-or-Delete-Documents) or [.NET SDK](search-get-started-dotnet.md) to push updated rows directly.
2729
@@ -200,7 +202,7 @@ Execution history contains up to 50 of the most recently completed executions, w
200202

201203
If your SQL database supports [change tracking](/sql/relational-databases/track-changes/about-change-tracking-sql-server), a search indexer can pick up just the new and updated content on subsequent indexer runs. Azure Cognitive Search provides two change detection policies to support incremental indexing.
202204

203-
Within an indexer definition, you can specify a change detection policies that tells the indexer which change tracking mechanism is used on your table or view. There are two policies to choose from:
205+
Within an indexer definition, you can specify a change detection policy that tells the indexer which change tracking mechanism is used on your table or view. There are two policies to choose from:
204206

205207
+ "SqlIntegratedChangeTrackingPolicy" (applies to tables only)
206208

articles/search/search-howto-index-cosmosdb-gremlin.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -240,9 +240,13 @@ Execution history contains up to 50 of the most recently completed executions, w
240240

241241
<a name="DataChangeDetectionPolicy"></a>
242242

243-
## Indexing changed documents
243+
## Indexing new and changed documents
244244

245-
The purpose of a data change detection policy is to efficiently identify changed data items. Currently, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB, which is specified in the data source definition as follows:
245+
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
246+
247+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
248+
249+
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
246250

247251
```http
248252
"dataChangeDetectionPolicy": {
@@ -251,8 +255,6 @@ The purpose of a data change detection policy is to efficiently identify changed
251255
},
252256
```
253257

254-
Using this policy is highly recommended to ensure good indexer performance.
255-
256258
<a name="DataDeletionDetectionPolicy"></a>
257259

258260
## Indexing deleted documents

articles/search/search-howto-index-cosmosdb-mongodb.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -223,9 +223,13 @@ Execution history contains up to 50 of the most recently completed executions, w
223223

224224
<a name="DataChangeDetectionPolicy"></a>
225225

226-
## Indexing changed documents
226+
## Indexing new and changed documents
227227

228-
The purpose of a data change detection policy is to efficiently identify changed data items. Currently, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB, which is specified in the data source definition as follows:
228+
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
229+
230+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
231+
232+
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
229233

230234
```http
231235
"dataChangeDetectionPolicy": {
@@ -234,8 +238,6 @@ The purpose of a data change detection policy is to efficiently identify changed
234238
},
235239
```
236240

237-
Using this policy is highly recommended to ensure good indexer performance.
238-
239241
<a name="DataDeletionDetectionPolicy"></a>
240242

241243
## Indexing deleted documents

articles/search/search-howto-index-cosmosdb.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -283,9 +283,13 @@ Execution history contains up to 50 of the most recently completed executions, w
283283

284284
<a name="DataChangeDetectionPolicy"></a>
285285

286-
## Indexing changed documents
286+
## Indexing new and changed documents
287287

288-
The purpose of a data change detection policy is to efficiently identify changed data items. Currently, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB, which is specified in the data source definition as follows:
288+
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
289+
290+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
291+
292+
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
289293

290294
```http
291295
"dataChangeDetectionPolicy": {
@@ -294,19 +298,15 @@ The purpose of a data change detection policy is to efficiently identify changed
294298
},
295299
```
296300

297-
Using this policy is highly recommended to ensure good indexer performance.
298-
299-
If you're using a custom query, make sure that the `_ts` property is projected by the query.
300-
301301
<a name="IncrementalProgress"></a>
302302

303-
### Incremental progress and custom queries
303+
### Incremental indexing and custom queries
304304

305-
Incremental progress during indexing ensures that if indexer execution is interrupted by transient failures or execution time limit, the indexer can pick up where it left off next time it runs, instead of having to reindex the entire collection from scratch. This is especially important when indexing large collections.
305+
If you're using a [custom query to retrieve documents](#flatten-structures), make sure the query orders the results by the `_ts` column. This enables periodic check-pointing that Azure Cognitive Search uses to provide incremental progress in the presence of failures.
306306

307-
To enable incremental progress when using a custom query, ensure that your query orders the results by the `_ts` column. This enables periodic check-pointing that Azure Cognitive Search uses to provide incremental progress in the presence of failures.
307+
In some cases, even if your query contains an `ORDER BY [collection alias]._ts` clause, Azure Cognitive Search may not infer that the query is ordered by the `_ts`. You can tell Azure Cognitive Search that results are ordered by setting the `assumeOrderByHighWaterMarkColumn` configuration property.
308308

309-
In some cases, even if your query contains an `ORDER BY [collection alias]._ts` clause, Azure Cognitive Search may not infer that the query is ordered by the `_ts`. You can tell Azure Cognitive Search that results are ordered by using the `assumeOrderByHighWaterMarkColumn` configuration property. To specify this hint, create or update your indexer as follows:
309+
To specify this hint, [create or update your indexer definition](#configure-and-run-the-cosmos-db-indexer) as follows:
310310

311311
```http
312312
{

0 commit comments

Comments
 (0)