Skip to content

Commit bcdf5da

Browse files
authored
Merge pull request #185709 from HeidiSteen/heidist-fresh
[azure search] schedule corrections
2 parents 321e141 + e0c5c90 commit bcdf5da

File tree

2 files changed

+18
-19
lines changed

2 files changed

+18
-19
lines changed

articles/search/search-howto-create-indexers.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ ms.date: 01/17/2022
1414

1515
# Creating indexers in Azure Cognitive Search
1616

17-
A search indexer provides an automated workflow for reading content from an external data source, and ingesting that content into a search index on your search service. Indexers support two workflows:
17+
A search indexer connects to an external data source, retrieves and processes data, and then passes it to the search engine for indexing. Indexers support two workflows:
1818

19-
+ Extract text and metadata during indexing for full text search scenarios
19+
+ Extract text and metadata during indexing for full text search scenarios.
2020

21-
+ Apply integrated machine learning and AI models to analyze content that is *not* intrinsically searchable, such as images and large undifferentiated text. This extended workflow is called [AI enrichment](cognitive-search-concept-intro.md) and it's indexer-driven.
21+
+ Apply integrated machine learning and AI models to analyze content that is not otherwise searchable, such as images and large undifferentiated text. This extended workflow is called [AI enrichment](cognitive-search-concept-intro.md) and it's indexer-driven.
2222

2323
Using indexers significantly reduces the quantity and complexity of the code you need to write. This article focuses on the basics of creating an indexer. Depending on the data source and your workflow, additional configuration might be necessary.
2424

articles/search/search-howto-schedule-indexers.md

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,48 +8,47 @@ manager: nitinme
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: how-to
11-
ms.date: 01/11/2022
11+
ms.date: 01/21/2022
1212
---
1313

14-
# Schedule indexers in Azure Cognitive Search
14+
# Schedule an indexer in Azure Cognitive Search
1515

16-
By default, an indexer normally runs once, immediately after it is created. Afterwards, you can run it again on demand or set up a schedule. Some situations where indexer scheduling is useful include:
16+
Indexers can be configured to run on a schedule when you set the "schedule" property in the indexer definition. By default, an indexer runs once, immediately after it is created. Afterwards, you can run it again on demand or set up a schedule. Some situations where indexer scheduling is useful include:
1717

1818
* Source data will change over time, and you want the search indexer to automatically process the difference.
1919

20-
* A search index will be populated from multiple data sources, and you want the indexers to run at different times to reduce conflicts.
20+
* A search index will be populated from multiple data sources, and you want to stagger the indexer jobs to reduce conflicts.
2121

22-
* Source data is very large and you want to spread the indexer processing over time. Indexer jobs are subject to a maximum running time of 24 hours for regular data sources and 2 hours for indexers with skillsets. If indexing cannot complete within the maximum interval, you can configure a schedule that runs every 2 hours. Indexers can automatically pick up where they left off, as evidenced by an internal high water mark that marks where indexing last ended. Running an indexer on a recurring 2-hour schedule allows it to process a very large data set (many millions of documents) beyond the interval allowed for a single job. For more information about indexing large data volumes, see [How to index large data sets in Azure Cognitive Search](search-howto-large-index.md).
22+
* Source data is very large and you want to spread the indexer processing over time.
2323

24-
> [!NOTE]
25-
> The scheduler is a built-in feature of Azure Cognitive Search. There is no support for external schedulers.
24+
Indexer jobs are subject to a maximum running time of 24 hours for regular data sources and 2 hours for indexers with skillsets. If indexing cannot complete within the maximum interval, you can configure a schedule that runs every 2 hours. Indexers can automatically pick up where they left off, as evidenced by an internal high water mark that marks where indexing last ended. Running an indexer on a recurring 2-hour schedule allows it to process a very large data set (many millions of documents) beyond the 24-interval allowed for a single job. For more information about indexing large data volumes, see [How to index large data sets in Azure Cognitive Search](search-howto-large-index.md).
2625

2726
## Schedule property
2827

29-
A schedule is part of the indexer definition. If the **schedule** property is omitted, the indexer will only run once immediately after it is created. If you add a **schedule** property, you'll specify two parts.
28+
A schedule is part of the indexer definition. If the "schedule" property is omitted, the indexer will only run on demand. The property has two parts.
3029

3130
| Property | Description |
3231
|----------|-------------|
33-
|**Interval** | (required) The amount of time between the start of two consecutive indexer executions. The smallest interval allowed is 5 minutes, and the longest is 1440 minutes (24 hours). It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an [ISO 8601 duration](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration) value). The pattern for this is: `P(nD)(T(nH)(nM))`. <br/><br/>Examples: `PT15M` for every 15 minutes, `PT2H` for every 2 hours.|
34-
| **Start Time (UTC)** | (optional) Indicates when scheduled executions should begin. If omitted, the current UTC time is used. This time can be in the past, in which case the first execution is scheduled as if the indexer has been running continuously since the original **startTime**.<br/><br/>Examples: `2021-01-01T00:00:00Z` starting at midnight on January 1, `2021-01-05T22:28:00Z` starting at 10:28 p.m. on January 5.|
32+
| `"interval"` (minutes) | (required) The amount of time between the start of two consecutive indexer executions. The smallest interval allowed is 5 minutes, and the longest is 1440 minutes (24 hours). It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an [ISO 8601 duration](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration) value). </br></br>The pattern for this is: `P(nD)(T(nH)(nM))`. </br></br>Examples: `PT15M` for every 15 minutes, `PT2H` for every 2 hours.|
33+
| `"startTime"` | (optional) Start time is specified in coordinated universal time (UTC). If omitted, the current time is used. This time can be in the past, in which case the first execution is scheduled as if the indexer has been running continuously since the original start time.|
3534

36-
Visually, a schedule might look like the following: starting on January 1 and running every 50 minutes.
35+
The following example is a schedule that starts on January 1 at midnight and runs every 50 minutes.
3736

3837
```json
3938
{
4039
"dataSourceName" : "hotels-ds",
4140
"targetIndexName" : "hotels-idx",
42-
"schedule" : { "interval" : "PT50M", "startTime" : "2021-01-01T00:00:00Z" }
41+
"schedule" : { "interval" : "PT50M", "startTime" : "2022-01-01T00:00:00Z" }
4342
}
4443
```
4544

4645
## Scheduling behavior
4746

48-
The scheduler will only kick off one indexer at a time. If you have multiple indexers that are all scheduled to start at 6:00 a.m. every morning, the scheduler will kick off the jobs sequentially. You can only obtain multiple concurrent jobs if you [run indexers on demand](search-howto-run-reset-indexers.md).
47+
The scheduler can kick off as many indexer jobs as the search service supports, which is based on the number of search units. For example, the service has three replicas and four partitions, you should be able to have twelve indexer jobs in active execution, whether initiated on demand or on a schedule.
4948

50-
Only one instance of a given indexer can run at a time. If it's still running when the next scheduled execution is set to start, indexer execution is postponed until the next scheduled occurrence.
49+
Although multiple indexers can run simultaneously, a given indexer is single instance. You cannot run two copies of the same indexer concurrently. If an indexer happens to still be running when its next scheduled execution is set to start, the pending execution is postponed until the next scheduled occurrence, allowing the current job to finish.
5150

52-
Let’s consider an example to make this more concrete. Suppose we configure an indexer schedule with an **Interval** of hourly and a **Start Time** of June 1, 2021 at 8:00:00 AM UTC. Heres what could happen when an indexer run takes longer than an hour:
51+
Let’s consider an example to make this more concrete. Suppose we configure an indexer schedule with an interval of hourly and a start time of June 1, 2021 at 8:00:00 AM UTC. Here's what could happen when an indexer run takes longer than an hour:
5352

5453
* The first indexer execution starts at or around June 1, 2021 at 8:00 AM UTC. Assume this execution takes 20 minutes (or any time less than 1 hour).
5554
* The second execution starts at or around June 1, 2021 9:00 AM UTC. Suppose that this execution takes 70 minutes - more than an hour – and it will not complete until 10:10 AM UTC.
@@ -89,7 +88,7 @@ Schedules are specified in an indexer definition. To set up a schedule, you can
8988
9089
Call the [IndexingSchedule](/dotnet/api/azure.search.documents.indexes.models.indexingschedule) class when creating or updating an indexer using the [SearchIndexerClient](/dotnet/api/azure.search.documents.indexes.searchindexerclient).
9190
92-
The **IndexingSchedule** constructor requires an **Interval** parameter specified using a **TimeSpan** object. Recall that the smallest interval value allowed is 5 minutes, and the largest is 24 hours. The second **StartTime** parameter, specified as a **DateTimeOffset** object, is optional.
91+
The IndexingSchedule constructor requires an interval parameter specified using a TimeSpan object. Recall that the smallest interval value allowed is 5 minutes, and the largest is 24 hours. The second StartTime parameter, specified as a DateTimeOffset object, is optional.
9392
9493
The following C# example creates an indexer, using a predefined data source and index, and sets its schedule to run once every day, starting now:
9594

0 commit comments

Comments
 (0)