Skip to content

Commit 4b65fa9

Browse files
Merge pull request #220759 from HeidiSteen/heidist-work
misc edits
2 parents fec3f3e + 58f26a3 commit 4b65fa9

File tree

3 files changed

+20
-16
lines changed

3 files changed

+20
-16
lines changed

articles/search/search-howto-create-indexers.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ An indexer is a named object on a search service that automates an indexing work
1919

2020
Indexers support two workflows:
2121

22-
+ Text-based indexing, extracting strings and metadata for full text search scenarios.
22+
+ Text-based indexing, extracting strings and metadata from textual content for full text search scenarios.
2323

24-
+ Skills-based indexing, using built-in or custom skills to apply integrated machine learning and AI models that analyze content for text and structure. Skill-based indexing enables search over content that isn't otherwise easily searchable, such as images and large undifferentiated text. To learn about skills-based indexing, see [AI enrichment in Cognitive Search](cognitive-search-concept-intro.md).
24+
+ Skills-based indexing, using built-in or custom skills that add integrated machine learning for analysis over images and large undifferentiated content, extracting or inferring text and structure. Skill-based indexing enables search over content that isn't otherwise easily full text searchable. To learn more, see [AI enrichment in Cognitive Search](cognitive-search-concept-intro.md).
2525

2626
This article focuses on the basic steps of creating an indexer. Depending on the data source and your workflow, more configuration might be necessary.
2727

@@ -71,9 +71,7 @@ You can also [specify a schedule](search-howto-schedule-indexers.md) or set an [
7171

7272
### Indexer definition for skills-based indexing and AI enrichment
7373

74-
Indexers also drive [AI enrichment](cognitive-search-concept-intro.md). All of the above properties and parameters apply, but the following extra properties are specific to AI enrichment: **`skillSetName`**, **`outputFieldMappings`**, **`cache`**.
75-
76-
A [skillset](cognitive-search-defining-skillset.md) also has **`cognitiveServices`**, and **`knowledgeStore`**. A few other required and similarly named properties are added for context.
74+
Indexers also drive [AI enrichment](cognitive-search-concept-intro.md). All of the above properties and parameters for apply, but the following extra properties are specific to AI enrichment: "skillSetName", "cache", "outputFieldMappings".
7775

7876
```json
7977
{
@@ -91,7 +89,7 @@ A [skillset](cognitive-search-defining-skillset.md) also has **`cognitiveService
9189
}
9290
```
9391

94-
AI enrichment is out of scope for this article. For more information, start with [AI enrichment](cognitive-search-concept-intro.md), [Skillsets in Azure Cognitive Search](cognitive-search-working-with-skillsets.md), [Create a skillset](cognitive-search-defining-skillset.md), [Map enrichment output fields](cognitive-search-output-field-mapping.md), and [Enable caching for AI enrichment](search-howto-incremental-index.md).
92+
AI enrichment is its own subject area and is out of scope for this article. For more information, start with [AI enrichment](cognitive-search-concept-intro.md), [Skillsets in Azure Cognitive Search](cognitive-search-working-with-skillsets.md), [Create a skillset](cognitive-search-defining-skillset.md), [Map enrichment output fields](cognitive-search-output-field-mapping.md), and [Enable caching for AI enrichment](search-howto-incremental-index.md).
9593

9694
## Prerequisites
9795

@@ -134,6 +132,8 @@ Indexers require a data source that specifies the type, container, and connectio
134132
+ [Azure Cosmos DB](search-howto-index-cosmosdb.md)
135133
+ [Azure SQL Database](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md)
136134

135+
1. If the data source is a database, such as Azure SQL or Cosmos DB, enable change tracking. The above links for the various data sources explain which change tracking methods are supported by indexers.
136+
137137
## Prepare an index
138138

139139
Indexers also require a search index. Recall that indexers pass data off to the search engine for indexing. Just as indexers have properties that determine execution behavior, an index schema has properties that profoundly affect how strings are indexed (only strings are analyzed and tokenized).
@@ -230,9 +230,15 @@ If your data source supports change detection, an indexer can detect underlying
230230

231231
Change detection logic is built into the data platforms. How an indexer supports change detection varies by data source:
232232

233-
+ Azure Storage has built-in change detection, which means an indexer can recognize new and updated documents automatically. Blob Storage, Azure Table Storage, and Azure Data Lake Storage Gen2 stamp each blob or row update with a date and time. An indexer automatically uses this information to determine which documents to update in the index.
233+
+ Azure Storage has built-in change detection, which means an indexer can recognize new and updated documents automatically. Blob Storage, Azure Table Storage, and Azure Data Lake Storage Gen2 stamp each blob or row update with a date and time. An indexer automatically uses this information to determine which documents to update in the index. For more information about deletion detection, see [Delete detection using indexers for Azure Storage in Azure Cognitive Search](search-howto-index-changed-deleted-blobs.md).
234+
235+
+ Cloud database technologies provide optional change detection features in their platforms. For these data sources, change detection isn't automatic. You'll need to specify in the data source definition which change detection policy is used:
234236

235-
+ Azure SQL and Azure Cosmos DB provide optional change detection features in their platforms. For these data sources, change detection isn't automatic. You'll need to specify in the data source definition which change detection policy is used.
237+
+ [Azure SQL (change detection)](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md#indexing-new-changed-and-deleted-rows)
238+
+ [Azure DB for MySQL (change detection)](search-howto-index-mysql.md#indexing-new-and-changed-rows)
239+
+ [Azure Cosmos DB for NoSQL (change detection)](search-howto-index-cosmosdb.md#indexing-new-and-changed-documents)
240+
+ [Azure Cosmos DB for MongoDB (change detection)](search-howto-index-cosmosdb-mongodb.md#indexing-new-and-changed-documents)
241+
+ [Azure CosmosDB for Apache Gremlin (change detection)](search-howto-index-cosmosdb-gremlin.md#indexing-new-and-changed-documents)
236242

237243
Indexers keep track of the last document it processed from the data source through an internal "high water mark". The marker is never exposed in the API, but internally the indexer keeps track of where it stopped. When indexing resumes, either through a scheduled run or an on-demand invocation, the indexer references the high water mark so that it can pick up where it left off.
238244

articles/search/search-howto-run-reset-indexers.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,14 @@ This article explains how to run indexers on demand, with and without a reset.
2323

2424
## Run without reset
2525

26-
[Run Indexer](/rest/api/searchservice/run-indexer) will detect and process only what it necessary to synchronize the search index with changes in the underlying data source. Incremental indexing starts by locating an internal high-water mark to find the last updated search document, which becomes the starting point for indexer execution over new and updated documents in the data source.
26+
A [Run Indexer](/rest/api/searchservice/run-indexer) operation will detect and process only what it necessary to synchronize the search index with changes in the underlying data source. Incremental indexing starts by locating an internal high-water mark to find the last updated search document, which becomes the starting point for indexer execution over new and updated documents in the data source.
2727

28-
Change detection is essential for determining what's new or updated in the data source. Indexers use the change detection capabilities of the underlying data source to determine what's new or updated in the data source.
28+
[Change detection](search-howto-create-indexers.md#change-detection-and-internal-state) is essential for determining what's new or updated in the data source. Indexers use the change detection capabilities of the underlying data source to determine what's new or updated in the data source.
2929

30-
+ Azure Storage has built-in change detection through its LastModified property
30+
+ Azure Storage has built-in change detection through its LastModified property.
3131
+ Other data sources, such as Azure SQL or Azure Cosmos DB, have to be configured for change detection before the indexer can read new and updated rows.
3232

33-
If the underlying content is unchanged, a run operation has no effect. In this case, indexer execution history will indicate `0\0` documents processed.
33+
If the underlying content is unchanged, a run operation has no effect. In this case, indexer execution history will indicate `0\0` documents processed. You'll need to reset the index if want to reprocess in full.
3434

3535
## Indexer execution
3636

articles/search/search-howto-schedule-indexers.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,11 @@ Indexers can be configured to run on a schedule when you set the "schedule" prop
1818

1919
+ Source data is changing over time, and you want the indexer to automatically process the difference.
2020

21-
+ A search index is populated from multiple data sources, and you want to stagger the indexer jobs to reduce conflicts.
21+
+ An index is populated from multiple data sources and indexers, and you want to stagger the indexer jobs to reduce conflicts.
2222

2323
+ Source data is very large and you want to spread the indexer processing over time.
2424

25-
Indexer jobs are subject to a 2-hour maximum duration. Currently, some indexers have a longer 24-hour maximum execution window, but that behavior isn’t the norm. The longer window only applies if a service or its indexers can’t be internally migrated to the newer runtime behavior.
26-
27-
If indexing can't complete within the maximum interval, you can [schedule the indexer](search-howto-schedule-indexers.md) to run every 2 hours. As long as your data source supports [change detection logic](search-howto-create-indexers.md#change-detection-and-internal-state), indexers can automatically pick up where they left off, based on an internal high water mark that marks where indexing last ended. Running an indexer on a recurring 2-hour schedule allows it to process a very large data set (many millions of documents). For more information about indexing large data volumes, see [How to index large data sets in Azure Cognitive Search](search-howto-large-index.md).
25+
If indexing isn't completing within the processing window, you can [schedule the indexer](search-howto-schedule-indexers.md) to run at specific intervals. Most indexers run within a 2-hour processing window, so scheduling on 2-hour cadence is recommended for working through a large volume of data (many millions of documents). As long as your data source supports [change detection logic](search-howto-create-indexers.md#change-detection-and-internal-state), indexers can automatically pick up where they left off on each run, based on an internal high water mark that marks where indexing last ended.
2826

2927
## Prerequisites
3028

0 commit comments

Comments
 (0)