You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-create-indexers.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,11 +14,11 @@ ms.date: 01/17/2022
14
14
15
15
# Creating indexers in Azure Cognitive Search
16
16
17
-
A search indexer provides an automated workflow for reading content from an external data source, and ingesting that content into a search index on your search service. Indexers support two workflows:
17
+
A search indexer connects to an external data source, retrieves and processes data, and then passes it to the search engine for indexing. Indexers support two workflows:
18
18
19
-
+ Extract text and metadata during indexing for full text search scenarios
19
+
+ Extract text and metadata during indexing for full text search scenarios.
20
20
21
-
+ Apply integrated machine learning and AI models to analyze content that is *not* intrinsically searchable, such as images and large undifferentiated text. This extended workflow is called [AI enrichment](cognitive-search-concept-intro.md) and it's indexer-driven.
21
+
+ Apply integrated machine learning and AI models to analyze content that is not otherwise searchable, such as images and large undifferentiated text. This extended workflow is called [AI enrichment](cognitive-search-concept-intro.md) and it's indexer-driven.
22
22
23
23
Using indexers significantly reduces the quantity and complexity of the code you need to write. This article focuses on the basics of creating an indexer. Depending on the data source and your workflow, additional configuration might be necessary.
Copy file name to clipboardExpand all lines: articles/search/search-howto-schedule-indexers.md
+15-16Lines changed: 15 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,48 +8,47 @@ manager: nitinme
8
8
ms.author: heidist
9
9
ms.service: cognitive-search
10
10
ms.topic: how-to
11
-
ms.date: 01/11/2022
11
+
ms.date: 01/21/2022
12
12
---
13
13
14
-
# Schedule indexers in Azure Cognitive Search
14
+
# Schedule an indexer in Azure Cognitive Search
15
15
16
-
By default, an indexer normally runs once, immediately after it is created. Afterwards, you can run it again on demand or set up a schedule. Some situations where indexer scheduling is useful include:
16
+
Indexers can be configured to run on a schedule when you set the "schedule" property in the indexer definition. By default, an indexer runs once, immediately after it is created. Afterwards, you can run it again on demand or set up a schedule. Some situations where indexer scheduling is useful include:
17
17
18
18
* Source data will change over time, and you want the search indexer to automatically process the difference.
19
19
20
-
* A search index will be populated from multiple data sources, and you want the indexers to run at different times to reduce conflicts.
20
+
* A search index will be populated from multiple data sources, and you want to stagger the indexer jobs to reduce conflicts.
21
21
22
-
* Source data is very large and you want to spread the indexer processing over time. Indexer jobs are subject to a maximum running time of 24 hours for regular data sources and 2 hours for indexers with skillsets. If indexing cannot complete within the maximum interval, you can configure a schedule that runs every 2 hours. Indexers can automatically pick up where they left off, as evidenced by an internal high water mark that marks where indexing last ended. Running an indexer on a recurring 2-hour schedule allows it to process a very large data set (many millions of documents) beyond the interval allowed for a single job. For more information about indexing large data volumes, see [How to index large data sets in Azure Cognitive Search](search-howto-large-index.md).
22
+
* Source data is very large and you want to spread the indexer processing over time.
23
23
24
-
> [!NOTE]
25
-
> The scheduler is a built-in feature of Azure Cognitive Search. There is no support for external schedulers.
24
+
Indexer jobs are subject to a maximum running time of 24 hours for regular data sources and 2 hours for indexers with skillsets. If indexing cannot complete within the maximum interval, you can configure a schedule that runs every 2 hours. Indexers can automatically pick up where they left off, as evidenced by an internal high water mark that marks where indexing last ended. Running an indexer on a recurring 2-hour schedule allows it to process a very large data set (many millions of documents) beyond the 24-interval allowed for a single job. For more information about indexing large data volumes, see [How to index large data sets in Azure Cognitive Search](search-howto-large-index.md).
26
25
27
26
## Schedule property
28
27
29
-
A schedule is part of the indexer definition. If the **schedule** property is omitted, the indexer will only run once immediately after it is created. If you add a **schedule**property, you'll specify two parts.
28
+
A schedule is part of the indexer definition. If the "schedule" property is omitted, the indexer will only run on demand. The property has two parts.
30
29
31
30
| Property | Description |
32
31
|----------|-------------|
33
-
|**Interval**| (required) The amount of time between the start of two consecutive indexer executions. The smallest interval allowed is 5 minutes, and the longest is 1440 minutes (24 hours). It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an [ISO 8601 duration](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration) value). The pattern for this is: `P(nD)(T(nH)(nM))`. <br/><br/>Examples: `PT15M` for every 15 minutes, `PT2H` for every 2 hours.|
34
-
|**Start Time (UTC)**| (optional) Indicates when scheduled executions should begin. If omitted, the current UTC time is used. This time can be in the past, in which case the first execution is scheduled as if the indexer has been running continuously since the original **startTime**.<br/><br/>Examples: `2021-01-01T00:00:00Z` starting at midnight on January 1, `2021-01-05T22:28:00Z` starting at 10:28 p.m. on January 5.|
32
+
|`"interval"` (minutes) | (required) The amount of time between the start of two consecutive indexer executions. The smallest interval allowed is 5 minutes, and the longest is 1440 minutes (24 hours). It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an [ISO 8601 duration](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration) value). </br></br>The pattern for this is: `P(nD)(T(nH)(nM))`. </br></br>Examples: `PT15M` for every 15 minutes, `PT2H` for every 2 hours.|
33
+
|`"startTime"`| (optional) Start time is specified in coordinated universal time (UTC). If omitted, the current time is used. This time can be in the past, in which case the first execution is scheduled as if the indexer has been running continuously since the original start time.|
35
34
36
-
Visually, a schedule might look like the following: starting on January 1 and running every 50 minutes.
35
+
The following example is a schedule that starts on January 1 at midnight and runs every 50 minutes.
The scheduler will only kick off one indexer at a time. If you have multiple indexers that are all scheduled to start at 6:00 a.m. every morning, the scheduler will kick off the jobs sequentially. You can only obtain multiple concurrent jobs if you [run indexers on demand](search-howto-run-reset-indexers.md).
47
+
The scheduler can kick off as many indexer jobs as the search service supports, which is based on the number of search units. For example, the service has three replicas and four partitions, you should be able to have twelve indexer jobs in active execution, whether initiated on demand or on a schedule.
49
48
50
-
Only one instance of a given indexer can run at a time. If it's still running when the next scheduled execution is set to start, indexer execution is postponed until the next scheduled occurrence.
49
+
Although multiple indexers can run simultaneously, a given indexer is single instance. You cannot run two copies of the same indexer concurrently. If an indexer happens to still be running when its next scheduled execution is set to start, the pending execution is postponed until the next scheduled occurrence, allowing the current job to finish.
51
50
52
-
Let’s consider an example to make this more concrete. Suppose we configure an indexer schedule with an **Interval** of hourly and a **Start Time** of June 1, 2021 at 8:00:00 AM UTC. Here’s what could happen when an indexer run takes longer than an hour:
51
+
Let’s consider an example to make this more concrete. Suppose we configure an indexer schedule with an interval of hourly and a start time of June 1, 2021 at 8:00:00 AM UTC. Here's what could happen when an indexer run takes longer than an hour:
53
52
54
53
* The first indexer execution starts at or around June 1, 2021 at 8:00 AM UTC. Assume this execution takes 20 minutes (or any time less than 1 hour).
55
54
* The second execution starts at or around June 1, 2021 9:00 AM UTC. Suppose that this execution takes 70 minutes - more than an hour – and it will not complete until 10:10 AM UTC.
@@ -89,7 +88,7 @@ Schedules are specified in an indexer definition. To set up a schedule, you can
89
88
90
89
Call the [IndexingSchedule](/dotnet/api/azure.search.documents.indexes.models.indexingschedule) class when creating or updating an indexer using the [SearchIndexerClient](/dotnet/api/azure.search.documents.indexes.searchindexerclient).
91
90
92
-
The **IndexingSchedule** constructor requires an **Interval** parameter specified using a **TimeSpan** object. Recall that the smallest interval value allowed is 5 minutes, and the largest is 24 hours. The second **StartTime** parameter, specified as a **DateTimeOffset** object, is optional.
91
+
The IndexingSchedule constructor requires an interval parameter specified using a TimeSpan object. Recall that the smallest interval value allowed is 5 minutes, and the largest is 24 hours. The second StartTime parameter, specified as a DateTimeOffset object, is optional.
93
92
94
93
The following C# example creates an indexer, using a predefined data source and index, and sets its schedule to run once every day, starting now:
0 commit comments