You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| smallmoney, money decimal numeric |Edm.String |Azure Cognitive Search does not support converting decimal types into Edm.Double because this would lose precision |
106
+
| smallmoney, money decimal numeric |Edm.String |Azure Cognitive Search does not support converting decimal types into Edm.Double because doing so would lose precision |
107
107
| char, nchar, varchar, nvarchar |Edm.String<br/>Collection(Edm.String) |A SQL string can be used to populate a Collection(Edm.String) field if the string represents a JSON array of strings: `["red", "white", "blue"]` |
@@ -200,9 +200,11 @@ Execution history contains up to 50 of the most recently completed executions, w
200
200
201
201
## Indexing new, changed, and deleted rows
202
202
203
-
If your SQL database supports [change tracking](/sql/relational-databases/track-changes/about-change-tracking-sql-server), a search indexer can pick up just the new and updated content on subsequent indexer runs. Azure Cognitive Search provides two change detection policies to support incremental indexing.
203
+
If your SQL database supports [change tracking](/sql/relational-databases/track-changes/about-change-tracking-sql-server), a search indexer can pick up just the new and updated content on subsequent indexer runs.
204
204
205
-
Within an indexer definition, you can specify a change detection policy that tells the indexer which change tracking mechanism is used on your table or view. There are two policies to choose from:
205
+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your table or view.
206
+
207
+
For Azure SQL indexers, there two change detection policies:
206
208
207
209
+ "SqlIntegratedChangeTrackingPolicy" (applies to tables only)
208
210
@@ -236,7 +238,7 @@ api-key: admin-key
236
238
}
237
239
```
238
240
239
-
When using SQL integrated change tracking policy, do not specify a separate data deletion detection policy. The SQL integrated change tracking policy has built-in support for identifying deleted rows. However, for the deletes to be detected automatically, the document key in your search index must be the same as the primary key in the SQL table.
241
+
When using SQL integrated change tracking policy, do not specify a separate data deletion detection policy. The SQL integrated change tracking policy has built-in support for identifying deleted rows. However, for the deleted rows to be detected automatically, the document key in your search index must be the same as the primary key in the SQL table.
240
242
241
243
> [!NOTE]
242
244
> When using [TRUNCATE TABLE](/sql/t-sql/statements/truncate-table-transact-sql) to remove a large number of rows from a SQL table, the indexer needs to be [reset](/rest/api/searchservice/reset-indexer) to reset the change tracking state to pick up row deletions.
@@ -282,12 +284,13 @@ api-key: admin-key
282
284
283
285
##### convertHighWaterMarkToRowVersion
284
286
285
-
If you're using a [rowversion](/sql/t-sql/data-types/rowversion-transact-sql) data type for the high water mark column, consider using the `convertHighWaterMarkToRowVersion` indexer configuration setting. `convertHighWaterMarkToRowVersion` does two things:
287
+
If you're using a [rowversion](/sql/t-sql/data-types/rowversion-transact-sql) data type for the high water mark column, consider setting the `convertHighWaterMarkToRowVersion` property in indexer configuration. Setting this property to true results in the following behaviors:
288
+
289
+
* Uses the rowversion data type for the high water mark column in the indexer SQL query. Using the correct data type improves indexer query performance.
286
290
287
-
* Use the rowversion data type for the high water mark column in the indexer sql query. Using the correct data type improves indexer query performance.
288
-
* Subtract 1 from the rowversion value before the indexer query runs. Views with 1 to many joins may have rows with duplicate rowversion values. Subtracting 1 ensures the indexer query doesn't miss these rows.
291
+
* Subtracts one from the rowversion value before the indexer query runs. Views with one-to-many joins may have rows with duplicate rowversion values. Subtracting 1one ensures the indexer query doesn't miss these rows.
289
292
290
-
To enable this feature, create or update the indexer with the following configuration:
293
+
To enable this property, create or update the indexer with the following configuration:
291
294
292
295
```http
293
296
{
@@ -301,7 +304,7 @@ To enable this feature, create or update the indexer with the following configur
301
304
302
305
##### queryTimeout
303
306
304
-
If you encounter timeout errors, you can use the `queryTimeout` indexer configuration setting to set the query timeout to a value higher than the default 5-minute timeout. For example, to set the timeout to 10 minutes, create or update the indexer with the following configuration:
307
+
If you encounter timeout errors, set the `queryTimeout` indexer configuration setting to a value higher than the default 5-minute timeout. For example, to set the timeout to 10 minutes, create or update the indexer with the following configuration:
305
308
306
309
```http
307
310
{
@@ -315,7 +318,7 @@ If you encounter timeout errors, you can use the `queryTimeout` indexer configur
315
318
316
319
##### disableOrderByHighWaterMarkColumn
317
320
318
-
You can also disable the `ORDER BY [High Water Mark Column]` clause. However, this is not recommended because if the indexer execution is interrupted by an error, the indexer has to re-process all rows if it runs later - even if the indexer has already processed almost all the rows by the time it was interrupted. To disable the `ORDER BY` clause, use the `disableOrderByHighWaterMarkColumn` setting in the indexer definition:
321
+
You can also disable the `ORDER BY [High Water Mark Column]` clause. However, this is not recommended because if the indexer execution is interrupted by an error, the indexer has to re-process all rows if it runs later, even if the indexer has already processed almost all the rows at the time it was interrupted. To disable the `ORDER BY` clause, use the `disableOrderByHighWaterMarkColumn` setting in the indexer definition:
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-cosmosdb-gremlin.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -145,9 +145,9 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
145
145
146
146
1. Create additional fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
147
147
148
-
### Mapping between JSON Data Types and Azure Cognitive Search Data Types
148
+
### Mapping data types
149
149
150
-
| JSON data type |Compatible target index field types |
150
+
| JSON data type |Cognitive Search field types |
151
151
| --- | --- |
152
152
| Bool |Edm.Boolean, Edm.String |
153
153
| Numbers that look like integers |Edm.Int32, Edm.Int64, Edm.String |
@@ -244,7 +244,9 @@ Execution history contains up to 50 of the most recently completed executions, w
244
244
245
245
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
246
246
247
-
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
247
+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your data.
248
+
249
+
For Cosmos DB indexers, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
248
250
249
251
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-cosmosdb-mongodb.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -128,9 +128,9 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
128
128
129
129
1. Create additional fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
130
130
131
-
### Mapping between JSON Data Types and Azure Cognitive Search Data Types
131
+
### Mapping data types
132
132
133
-
| JSON data type | Compatible target index field types |
133
+
| JSON data type | Cognitive Search field types |
134
134
| --- | --- |
135
135
| Bool |Edm.Boolean, Edm.String |
136
136
| Numbers that look like integers |Edm.Int32, Edm.Int64, Edm.String |
@@ -227,7 +227,9 @@ Execution history contains up to 50 of the most recently completed executions, w
227
227
228
228
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
229
229
230
-
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
230
+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your data.
231
+
232
+
For Cosmos DB indexers, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
231
233
232
234
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-cosmosdb.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -188,9 +188,9 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
188
188
189
189
1. Create additional fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
190
190
191
-
### Mapping between JSON Data Types and Azure Cognitive Search Data Types
191
+
### Mapping data types
192
192
193
-
| JSON data type | Compatible target index field types |
193
+
| JSON data types | Cognitive Search field types |
194
194
| --- | --- |
195
195
| Bool |Edm.Boolean, Edm.String |
196
196
| Numbers that look like integers |Edm.Int32, Edm.Int64, Edm.String |
@@ -287,7 +287,9 @@ Execution history contains up to 50 of the most recently completed executions, w
287
287
288
288
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
289
289
290
-
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
290
+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your data.
291
+
292
+
For Cosmos DB indexers, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
291
293
292
294
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-mysql.md
+30-28Lines changed: 30 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -103,6 +103,25 @@ In a [search index](search-what-is-an-index.md), add search index fields that co
103
103
104
104
If the primary key in the source table matches the document key (in this case, "ID"), the indexer will import the primary key as the document key.
105
105
106
+
<aname="TypeMapping"></a>
107
+
108
+
### Mapping data types
109
+
110
+
The following table maps the MySQL database to Cognitive Search equivalents. See [Supported data types (Azure Cognitive Search)](/rest/api/searchservice/supported-data-types) for more information.
111
+
112
+
> [!NOTE]
113
+
> The preview does not support geometry types and blobs.
114
+
115
+
| MySQL data types | Cognitive Search field types |
Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
@@ -180,13 +199,15 @@ The response includes status and the number of items processed. It should look s
180
199
181
200
Execution history contains up to 50 of the most recently completed executions, which are sorted in the reverse chronological order so that the latest execution comes first.
182
201
183
-
## Capture new, changed, and deleted rows
202
+
<aname="DataChangeDetectionPolicy"></a>
184
203
185
-
If your data source meets the requirements for change and deletion detection, the indexer can incrementally index the changes in your data source since the last indexer job, which means you can avoid having to re-index the entire table or view every time an indexer runs.
204
+
## Indexing new and changed rows
186
205
187
-
<aname="DataChangeDetectionPolicy"></a>
206
+
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed rows in your database.
188
207
189
-
### High Water Mark Change Detection policy
208
+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your data.
209
+
210
+
For Azure Database for MySQL indexers, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy).
190
211
191
212
An indexer's change detection policy relies on having a "high water mark" column that captures the row version, or the date and time when a row was last updated. It's often a DATE, DATETIME, or TIMESTAMP column at a granularity sufficient for meeting the requirements of a high water mark column.
192
213
@@ -197,7 +218,7 @@ In your MySQL database, the high water mark column must meet the following requi
197
218
+ The value of this column increases with each insert or update.
198
219
+ Queries with the following WHERE and ORDER BY clauses can be executed efficiently: `WHERE [High Water Mark Column] > [Current High Water Mark Value] ORDER BY [High Water Mark Column]`
199
220
200
-
To set a high water mark policy in your indexer datasource, create or update your data source like this:
221
+
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
201
222
202
223
```http
203
224
POST https://[search service name].search.windows.net/datasources?api-version=2020-06-30-Preview
@@ -222,11 +243,11 @@ api-key: [admin key]
222
243
223
244
<aname="DataDeletionDetectionPolicy"></a>
224
245
225
-
### Soft Delete Column Deletion Detection policy
246
+
##Indexing deleted rows
226
247
227
-
When rows are deleted from the source table, you probably want to delete those rows from the search index as well. If the rows are physically removed from the table, Azure Cognitive Search has no way to infer the presence of records that no longer exist. However, you can use the “soft-delete” technique to logically delete rows without removing them from the table. Add a column to your table or view and mark rows as deleted using that column.
248
+
When rows are deleted from the table or view, you normally want to delete those rows from the search index as well. However, if the rows are physically removed from the table, an indexer has no way to infer the presence of records that no longer exist. The solution is to use a "soft-delete" technique to logically delete rows without removing them from the table. You'll do this by adding a column to your table or view and mark rows as deleted using that column.
228
249
229
-
When using the soft-delete technique, you can specify the soft delete policy as follows when creating or updating the data source:
250
+
Given a column that provides deletion state, an indexer can be configured to remove any search documents for which deletion state is set to true. The configuration property that supports this behavior is a data deletion detection policy, which is specified in the [data source definition](#define-the-data-source) as follows:
230
251
231
252
```http
232
253
{
@@ -239,26 +260,7 @@ When using the soft-delete technique, you can specify the soft delete policy as
239
260
}
240
261
```
241
262
242
-
The "softDeleteMarkerValue" must be a string – use the string representation of your actual value. For example, if you have an integer column where deleted rows are marked with the value 1, use `"1"`. If you have a BIT column where deleted rows are marked with the Boolean true value, use the string literal `True` or `true`, the case doesn't matter.
243
-
244
-
<aname="TypeMapping"></a>
245
-
246
-
## Mapping data types
247
-
248
-
The following table maps the MySQL database to Cognitive Search equivalents. See [Supported data types (Azure Cognitive Search)](/rest/api/searchservice/supported-data-types) for more information.
249
-
250
-
> [!NOTE]
251
-
> The preview does not support geometry types and blobs.
The "softDeleteMarkerValue" must be a string. For example, if you have an integer column where deleted rows are marked with the value 1, use `"1"`. If you have a BIT column where deleted rows are marked with the Boolean true value, use the string literal `True` or `true` (the case doesn't matter).
Copy file name to clipboardExpand all lines: articles/search/search-howto-large-index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Index large data set using built-in indexers
3
3
titleSuffix: Azure Cognitive Search
4
-
description: Strategies for large data indexing or computationally-intensive indexing through batch mode, resourcing, and techniques for scheduled, parallel, and distributed indexing.
4
+
description: Strategies for large data indexing or computationallyintensive indexing through batch mode, resourcing, and techniques for scheduled, parallel, and distributed indexing.
0 commit comments