Skip to content

Commit 510311e

Browse files
authored
Merge pull request #190082 from HeidiSteen/heidist-work
more indexer H2 alignment
2 parents 5480dae + 6857b32 commit 510311e

6 files changed

+59
-48
lines changed

articles/search/search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept values fro
103103
| int, smallint, tinyint |Edm.Int32, Edm.Int64, Edm.String | |
104104
| bigint |Edm.Int64, Edm.String | |
105105
| real, float |Edm.Double, Edm.String | |
106-
| smallmoney, money decimal numeric |Edm.String |Azure Cognitive Search does not support converting decimal types into Edm.Double because this would lose precision |
106+
| smallmoney, money decimal numeric |Edm.String |Azure Cognitive Search does not support converting decimal types into Edm.Double because doing so would lose precision |
107107
| char, nchar, varchar, nvarchar |Edm.String<br/>Collection(Edm.String) |A SQL string can be used to populate a Collection(Edm.String) field if the string represents a JSON array of strings: `["red", "white", "blue"]` |
108108
| smalldatetime, datetime, datetime2, date, datetimeoffset |Edm.DateTimeOffset, Edm.String | |
109109
| uniqueidentifer |Edm.String | |
@@ -200,9 +200,11 @@ Execution history contains up to 50 of the most recently completed executions, w
200200

201201
## Indexing new, changed, and deleted rows
202202

203-
If your SQL database supports [change tracking](/sql/relational-databases/track-changes/about-change-tracking-sql-server), a search indexer can pick up just the new and updated content on subsequent indexer runs. Azure Cognitive Search provides two change detection policies to support incremental indexing.
203+
If your SQL database supports [change tracking](/sql/relational-databases/track-changes/about-change-tracking-sql-server), a search indexer can pick up just the new and updated content on subsequent indexer runs.
204204

205-
Within an indexer definition, you can specify a change detection policy that tells the indexer which change tracking mechanism is used on your table or view. There are two policies to choose from:
205+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your table or view.
206+
207+
For Azure SQL indexers, there two change detection policies:
206208

207209
+ "SqlIntegratedChangeTrackingPolicy" (applies to tables only)
208210

@@ -236,7 +238,7 @@ api-key: admin-key
236238
}
237239
```
238240

239-
When using SQL integrated change tracking policy, do not specify a separate data deletion detection policy. The SQL integrated change tracking policy has built-in support for identifying deleted rows. However, for the deletes to be detected automatically, the document key in your search index must be the same as the primary key in the SQL table.
241+
When using SQL integrated change tracking policy, do not specify a separate data deletion detection policy. The SQL integrated change tracking policy has built-in support for identifying deleted rows. However, for the deleted rows to be detected automatically, the document key in your search index must be the same as the primary key in the SQL table.
240242

241243
> [!NOTE]
242244
> When using [TRUNCATE TABLE](/sql/t-sql/statements/truncate-table-transact-sql) to remove a large number of rows from a SQL table, the indexer needs to be [reset](/rest/api/searchservice/reset-indexer) to reset the change tracking state to pick up row deletions.
@@ -282,12 +284,13 @@ api-key: admin-key
282284

283285
##### convertHighWaterMarkToRowVersion
284286

285-
If you're using a [rowversion](/sql/t-sql/data-types/rowversion-transact-sql) data type for the high water mark column, consider using the `convertHighWaterMarkToRowVersion` indexer configuration setting. `convertHighWaterMarkToRowVersion` does two things:
287+
If you're using a [rowversion](/sql/t-sql/data-types/rowversion-transact-sql) data type for the high water mark column, consider setting the `convertHighWaterMarkToRowVersion` property in indexer configuration. Setting this property to true results in the following behaviors:
288+
289+
* Uses the rowversion data type for the high water mark column in the indexer SQL query. Using the correct data type improves indexer query performance.
286290

287-
* Use the rowversion data type for the high water mark column in the indexer sql query. Using the correct data type improves indexer query performance.
288-
* Subtract 1 from the rowversion value before the indexer query runs. Views with 1 to many joins may have rows with duplicate rowversion values. Subtracting 1 ensures the indexer query doesn't miss these rows.
291+
* Subtracts one from the rowversion value before the indexer query runs. Views with one-to-many joins may have rows with duplicate rowversion values. Subtracting 1one ensures the indexer query doesn't miss these rows.
289292

290-
To enable this feature, create or update the indexer with the following configuration:
293+
To enable this property, create or update the indexer with the following configuration:
291294

292295
```http
293296
{
@@ -301,7 +304,7 @@ To enable this feature, create or update the indexer with the following configur
301304

302305
##### queryTimeout
303306

304-
If you encounter timeout errors, you can use the `queryTimeout` indexer configuration setting to set the query timeout to a value higher than the default 5-minute timeout. For example, to set the timeout to 10 minutes, create or update the indexer with the following configuration:
307+
If you encounter timeout errors, set the `queryTimeout` indexer configuration setting to a value higher than the default 5-minute timeout. For example, to set the timeout to 10 minutes, create or update the indexer with the following configuration:
305308

306309
```http
307310
{
@@ -315,7 +318,7 @@ If you encounter timeout errors, you can use the `queryTimeout` indexer configur
315318

316319
##### disableOrderByHighWaterMarkColumn
317320

318-
You can also disable the `ORDER BY [High Water Mark Column]` clause. However, this is not recommended because if the indexer execution is interrupted by an error, the indexer has to re-process all rows if it runs later - even if the indexer has already processed almost all the rows by the time it was interrupted. To disable the `ORDER BY` clause, use the `disableOrderByHighWaterMarkColumn` setting in the indexer definition:
321+
You can also disable the `ORDER BY [High Water Mark Column]` clause. However, this is not recommended because if the indexer execution is interrupted by an error, the indexer has to re-process all rows if it runs later, even if the indexer has already processed almost all the rows at the time it was interrupted. To disable the `ORDER BY` clause, use the `disableOrderByHighWaterMarkColumn` setting in the indexer definition:
319322

320323
```http
321324
{

articles/search/search-howto-index-cosmosdb-gremlin.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -145,9 +145,9 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
145145

146146
1. Create additional fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
147147

148-
### Mapping between JSON Data Types and Azure Cognitive Search Data Types
148+
### Mapping data types
149149

150-
| JSON data type | Compatible target index field types |
150+
| JSON data type | Cognitive Search field types |
151151
| --- | --- |
152152
| Bool |Edm.Boolean, Edm.String |
153153
| Numbers that look like integers |Edm.Int32, Edm.Int64, Edm.String |
@@ -244,7 +244,9 @@ Execution history contains up to 50 of the most recently completed executions, w
244244

245245
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
246246

247-
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
247+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your data.
248+
249+
For Cosmos DB indexers, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
248250

249251
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
250252

articles/search/search-howto-index-cosmosdb-mongodb.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -128,9 +128,9 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
128128
129129
1. Create additional fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
130130
131-
### Mapping between JSON Data Types and Azure Cognitive Search Data Types
131+
### Mapping data types
132132
133-
| JSON data type | Compatible target index field types |
133+
| JSON data type | Cognitive Search field types |
134134
| --- | --- |
135135
| Bool |Edm.Boolean, Edm.String |
136136
| Numbers that look like integers |Edm.Int32, Edm.Int64, Edm.String |
@@ -227,7 +227,9 @@ Execution history contains up to 50 of the most recently completed executions, w
227227

228228
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
229229

230-
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
230+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your data.
231+
232+
For Cosmos DB indexers, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
231233

232234
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
233235

articles/search/search-howto-index-cosmosdb.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -188,9 +188,9 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
188188
189189
1. Create additional fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
190190
191-
### Mapping between JSON Data Types and Azure Cognitive Search Data Types
191+
### Mapping data types
192192
193-
| JSON data type | Compatible target index field types |
193+
| JSON data types | Cognitive Search field types |
194194
| --- | --- |
195195
| Bool |Edm.Boolean, Edm.String |
196196
| Numbers that look like integers |Edm.Int32, Edm.Int64, Edm.String |
@@ -287,7 +287,9 @@ Execution history contains up to 50 of the most recently completed executions, w
287287

288288
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed documents in your database.
289289

290-
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. For Cosmos DB, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
290+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your data.
291+
292+
For Cosmos DB indexers, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB.
291293

292294
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
293295

articles/search/search-howto-index-mysql.md

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,25 @@ In a [search index](search-what-is-an-index.md), add search index fields that co
103103

104104
If the primary key in the source table matches the document key (in this case, "ID"), the indexer will import the primary key as the document key.
105105

106+
<a name="TypeMapping"></a>
107+
108+
### Mapping data types
109+
110+
The following table maps the MySQL database to Cognitive Search equivalents. See [Supported data types (Azure Cognitive Search)](/rest/api/searchservice/supported-data-types) for more information.
111+
112+
> [!NOTE]
113+
> The preview does not support geometry types and blobs.
114+
115+
| MySQL data types | Cognitive Search field types |
116+
| --------------- | -------------------------------- |
117+
| `bool`, `boolean` | Edm.Boolean, Edm.String |
118+
| `tinyint`, `smallint`, `mediumint`, `int`, `integer`, `year` | Edm.Int32, Edm.Int64, Edm.String |
119+
| `bigint` | Edm.Int64, Edm.String |
120+
| `float`, `double`, `real` | Edm.Double, Edm.String |
121+
| `date`, `datetime`, `timestamp` | Edm.DateTimeOffset, Edm.String |
122+
| `char`, `varchar`, `tinytext`, `mediumtext`, `text`, `longtext`, `enum`, `set`, `time` | Edm.String |
123+
| unsigned numerical data, serial, decimal, dec, bit, blob, binary, geometry | N/A |
124+
106125
## Configure and run the MySQL indexer
107126

108127
Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
@@ -180,13 +199,15 @@ The response includes status and the number of items processed. It should look s
180199

181200
Execution history contains up to 50 of the most recently completed executions, which are sorted in the reverse chronological order so that the latest execution comes first.
182201

183-
## Capture new, changed, and deleted rows
202+
<a name="DataChangeDetectionPolicy"></a>
184203

185-
If your data source meets the requirements for change and deletion detection, the indexer can incrementally index the changes in your data source since the last indexer job, which means you can avoid having to re-index the entire table or view every time an indexer runs.
204+
## Indexing new and changed rows
186205

187-
<a name="DataChangeDetectionPolicy"></a>
206+
Once an indexer has fully populated a search index, you might want subsequent indexer runs to incrementally index just the new and changed rows in your database.
188207

189-
### High Water Mark Change Detection policy
208+
To enable incremental indexing, set the "dataChangeDetectionPolicy" property in your data source definition. This property tells the indexer which change tracking mechanism is used on your data.
209+
210+
For Azure Database for MySQL indexers, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy).
190211

191212
An indexer's change detection policy relies on having a "high water mark" column that captures the row version, or the date and time when a row was last updated. It's often a DATE, DATETIME, or TIMESTAMP column at a granularity sufficient for meeting the requirements of a high water mark column.
192213

@@ -197,7 +218,7 @@ In your MySQL database, the high water mark column must meet the following requi
197218
+ The value of this column increases with each insert or update.
198219
+ Queries with the following WHERE and ORDER BY clauses can be executed efficiently: `WHERE [High Water Mark Column] > [Current High Water Mark Value] ORDER BY [High Water Mark Column]`
199220

200-
To set a high water mark policy in your indexer data source, create or update your data source like this:
221+
The following example shows a [data source definition](#define-the-data-source) with a change detection policy:
201222

202223
```http
203224
POST https://[search service name].search.windows.net/datasources?api-version=2020-06-30-Preview
@@ -222,11 +243,11 @@ api-key: [admin key]
222243
223244
<a name="DataDeletionDetectionPolicy"></a>
224245

225-
### Soft Delete Column Deletion Detection policy
246+
## Indexing deleted rows
226247

227-
When rows are deleted from the source table, you probably want to delete those rows from the search index as well. If the rows are physically removed from the table, Azure Cognitive Search has no way to infer the presence of records that no longer exist. However, you can use the “soft-delete technique to logically delete rows without removing them from the table. Add a column to your table or view and mark rows as deleted using that column.
248+
When rows are deleted from the table or view, you normally want to delete those rows from the search index as well. However, if the rows are physically removed from the table, an indexer has no way to infer the presence of records that no longer exist. The solution is to use a "soft-delete" technique to logically delete rows without removing them from the table. You'll do this by adding a column to your table or view and mark rows as deleted using that column.
228249

229-
When using the soft-delete technique, you can specify the soft delete policy as follows when creating or updating the data source:
250+
Given a column that provides deletion state, an indexer can be configured to remove any search documents for which deletion state is set to true. The configuration property that supports this behavior is a data deletion detection policy, which is specified in the [data source definition](#define-the-data-source) as follows:
230251

231252
```http
232253
{
@@ -239,26 +260,7 @@ When using the soft-delete technique, you can specify the soft delete policy as
239260
}
240261
```
241262

242-
The "softDeleteMarkerValue" must be a string – use the string representation of your actual value. For example, if you have an integer column where deleted rows are marked with the value 1, use `"1"`. If you have a BIT column where deleted rows are marked with the Boolean true value, use the string literal `True` or `true`, the case doesn't matter.
243-
244-
<a name="TypeMapping"></a>
245-
246-
## Mapping data types
247-
248-
The following table maps the MySQL database to Cognitive Search equivalents. See [Supported data types (Azure Cognitive Search)](/rest/api/searchservice/supported-data-types) for more information.
249-
250-
> [!NOTE]
251-
> The preview does not support geometry types and blobs.
252-
253-
| MySQL data type | Cognitive Search field type |
254-
| --------------- | -------------------------------- |
255-
| `bool`, `boolean` | Edm.Boolean, Edm.String |
256-
| `tinyint`, `smallint`, `mediumint`, `int`, `integer`, `year` | Edm.Int32, Edm.Int64, Edm.String |
257-
| `bigint` | Edm.Int64, Edm.String |
258-
| `float`, `double`, `real` | Edm.Double, Edm.String |
259-
| `date`, `datetime`, `timestamp` | Edm.DateTimeOffset, Edm.String |
260-
| `char`, `varchar`, `tinytext`, `mediumtext`, `text`, `longtext`, `enum`, `set`, `time` | Edm.String |
261-
| unsigned numerical data, serial, decimal, dec, bit, blob, binary, geometry | N/A |
263+
The "softDeleteMarkerValue" must be a string. For example, if you have an integer column where deleted rows are marked with the value 1, use `"1"`. If you have a BIT column where deleted rows are marked with the Boolean true value, use the string literal `True` or `true` (the case doesn't matter).
262264

263265
## Next steps
264266

articles/search/search-howto-large-index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Index large data set using built-in indexers
33
titleSuffix: Azure Cognitive Search
4-
description: Strategies for large data indexing or computationally-intensive indexing through batch mode, resourcing, and techniques for scheduled, parallel, and distributed indexing.
4+
description: Strategies for large data indexing or computationally intensive indexing through batch mode, resourcing, and techniques for scheduled, parallel, and distributed indexing.
55

66
manager: nitinme
77
author: dereklegenzoff

0 commit comments

Comments
 (0)