Skip to content

Commit b129ffe

Browse files
authored
Merge pull request #78759 from RobDixon22/master
New "How to schedule indexers" article.
2 parents 32eeb98 + a9e2650 commit b129ffe

8 files changed

+138
-32
lines changed

articles/search/TOC.yml

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -163,25 +163,29 @@
163163
items:
164164
- name: Indexers overview
165165
href: search-indexer-overview.md
166-
- name: Blob storage indexer
167-
href: search-howto-indexing-azure-blob-storage.md
168-
- name: Table storage indexer
166+
- name: Azure Table Storage indexer
169167
href: search-howto-indexing-azure-tables.md
170-
- name: SQL DB indexer
168+
- name: Azure SQL DB indexer
171169
href: search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md
172-
- name: Cosmos DB indexer
170+
- name: Azure Cosmos DB indexer
173171
href: search-howto-index-cosmosdb.md
174-
- name: One-to-many blob indexing
175-
href: search-howto-index-one-to-many-blobs.md
176-
- name: CSV blob indexing
177-
href: search-howto-index-csv-blobs.md
178-
- name: JSON blob indexing
179-
href: search-howto-index-json-blobs.md
172+
- name: Azure Blob Storage indexer
173+
items:
174+
- name: Set up a blob indexer
175+
href: search-howto-indexing-azure-blob-storage.md
176+
- name: Index one-to-many blobs
177+
href: search-howto-index-one-to-many-blobs.md
178+
- name: Index CSV blobs
179+
href: search-howto-index-csv-blobs.md
180+
- name: Index JSON blobs
181+
href: search-howto-index-json-blobs.md
182+
- name: Schedule indexers
183+
href: search-howto-schedule-indexers.md
184+
- name: Map fields
185+
href: search-indexer-field-mappings.md
180186
- name: Connect to SQL Server VMs
181187
href: search-howto-connecting-azure-sql-iaas-to-azure-search-using-indexers.md
182-
- name: Field mappings
183-
href: search-indexer-field-mappings.md
184-
- name: Troubleshooting common issues
188+
- name: Troubleshoot common issues
185189
href: search-indexer-troubleshooting.md
186190
- name: Enrich with AI
187191
items:
30.5 KB
Loading
62.5 KB
Loading

articles/search/search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -154,23 +154,7 @@ You can also arrange the indexer to run periodically on a schedule. To do this,
154154
155155
The **interval** parameter is required. The interval refers to the time between the start of two consecutive indexer executions. The smallest allowed interval is 5 minutes; the longest is one day. It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an [ISO 8601 duration](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration) value). The pattern for this is: `P(nD)(T(nH)(nM))`. Examples: `PT15M` for every 15 minutes, `PT2H` for every 2 hours.
156156
157-
The optional **startTime** indicates when the scheduled executions should commence. If it is omitted, the current UTC time is used. This time can be in the past – in which case the first execution is scheduled as if the indexer has been running continuously since the startTime.
158-
159-
Only one execution of an indexer can run at a time. If an indexer is running when its execution is scheduled, the execution is postponed until the next scheduled time.
160-
161-
Let’s consider an example to make this more concrete. Suppose we the following hourly schedule configured:
162-
163-
"schedule" : { "interval" : "PT1H", "startTime" : "2015-03-01T00:00:00Z" }
164-
165-
Here’s what happens:
166-
167-
1. The first indexer execution starts at or around March 1, 2015 12:00 a.m. UTC.
168-
2. Assume this execution takes 20 minutes (or any time less than 1 hour).
169-
3. The second execution starts at or around March 1, 2015 1:00 a.m.
170-
4. Now suppose that this execution takes more than an hour – for example, 70 minutes – so that it completes around 2:10 a.m.
171-
5. It’s now 2:00 a.m., time for the third execution to start. However, because the second execution from 1 a.m. is still running, the third execution is skipped. The third execution starts at 3 a.m.
172-
173-
You can add, change, or delete a schedule for an existing indexer by using a **PUT indexer** request.
157+
For more information about defining indexer schedules see [How to schedule indexers for Azure Search](search-howto-schedule-indexers.md).
174158
175159
<a name="CaptureChangedRows"></a>
176160

articles/search/search-howto-index-cosmosdb.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,8 @@ This indexer runs every two hours (schedule interval is set to "PT2H"). To run a
279279

280280
For more details on the Create Indexer API, check out [Create Indexer](https://docs.microsoft.com/rest/api/searchservice/create-indexer).
281281

282+
For more information about defining indexer schedules see [How to schedule indexers for Azure Search](search-howto-schedule-indexers.md).
283+
282284
## Use .NET
283285

284286
The generally available .NET SDK has full parity with the generally available REST API. We recommend that you review the previous REST API section to learn concepts, workflow, and requirements. You can then refer to following .NET API reference documentation to implement a JSON indexer in managed code.

articles/search/search-howto-indexing-azure-blob-storage.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,8 @@ This indexer will run every two hours (schedule interval is set to "PT2H"). To r
116116

117117
For more details on the Create Indexer API, check out [Create Indexer](https://docs.microsoft.com/rest/api/searchservice/create-indexer).
118118

119+
For more information about defining indexer schedules see [How to schedule indexers for Azure Search](search-howto-schedule-indexers.md).
120+
119121
## How Azure Search indexes blobs
120122

121123
Depending on the [indexer configuration](#PartsOfBlobToIndex), the blob indexer can index storage metadata only (useful when you only care about the metadata and don't need to index the content of blobs), storage and content metadata, or both metadata and textual content. By default, the indexer extracts both metadata and content.
@@ -139,7 +141,8 @@ Depending on the [indexer configuration](#PartsOfBlobToIndex), the blob indexer
139141
* **metadata\_storage\_last\_modified** (Edm.DateTimeOffset) - last modified timestamp for the blob. Azure Search uses this timestamp to identify changed blobs, to avoid reindexing everything after the initial indexing.
140142
* **metadata\_storage\_size** (Edm.Int64) - blob size in bytes.
141143
* **metadata\_storage\_content\_md5** (Edm.String) - MD5 hash of the blob content, if available.
142-
* **metadata\_storage\_sas\_token** (Edm.String) - A temporary token that can be used by [custom skills](cognitive-search-custom-skill-interface.md) to get right access to the blob. This sas token is not supposed to be stored for later use as it may expire.
144+
* **metadata\_storage\_sas\_token** (Edm.String) - A temporary SAS token that can be used by [custom skills](cognitive-search-custom-skill-interface.md) to get access to the blob. This token should not be stored for later use as it might expire.
145+
143146
* Metadata properties specific to each document format are extracted into the fields listed [here](#ContentSpecificMetadata).
144147

145148
You don't need to define fields for all of the above properties in your search index - just capture the properties you need for your application.

articles/search/search-howto-indexing-azure-tables.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,8 @@ This indexer runs every two hours. (The schedule interval is set to "PT2H".) To
111111

112112
For more information on the Create Indexer API, see [Create Indexer](https://docs.microsoft.com/rest/api/searchservice/create-indexer).
113113

114+
For more information about defining indexer schedules see [How to schedule indexers for Azure Search](search-howto-schedule-indexers.md).
115+
114116
## Deal with different field names
115117
Sometimes, the field names in your existing index are different from the property names in your table. You can use field mappings to map the property names from the table to the field names in your search index. To learn more about field mappings, see [Azure Search indexer field mappings bridge the differences between datasources and search indexes](search-indexer-field-mappings.md).
116118

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
title: How to schedule indexers - Azure Search
3+
description: Schedule Azure Search indexers to index content periodically or at specific times.
4+
5+
ms.date: 05/31/2019
6+
author: RobDixon22
7+
manager: HeidiSteen
8+
ms.author: v-rodixo
9+
services: search
10+
ms.service: search
11+
ms.devlang: rest-api
12+
ms.topic: conceptual
13+
ms.custom: seodec2018
14+
---
15+
16+
# How to schedule indexers for Azure Search
17+
An indexer normally runs once, immediately after it is created. You can run it again on demand using the portal, the REST API, or the .NET SDK. You can also configure an indexer to run periodically on a schedule.
18+
19+
Some situations where indexer scheduling is useful:
20+
21+
* Source data will change over time, and you want the Azure Search indexers to automatically process the changed data.
22+
* The index will be populated from multiple data sources and you want to make sure the indexers run at different times to reduce conflicts.
23+
* The source data is very large and you want to spread the indexer processing over time. For more information about indexing large volumes of data, see [How to index large data sets in Azure Search](search-howto-large-index.md).
24+
25+
The scheduler is a built-in feature of Azure Search. You can't use an external scheduler to control search indexers.
26+
27+
## Define schedule properties
28+
29+
An indexer schedule has two properties:
30+
* **Interval**, which defines the amount of time in between scheduled indexer executions. The smallest interval allowed is 5 minutes, and the largest is 24 hours.
31+
* **Start Time (UTC)**, which indicates the first time at which the indexer should be run.
32+
33+
You can specify a schedule when first creating the indexer, or by updating the indexer's properties later. Indexer schedules can be set using the [portal](#portal), the [REST API](#restApi), or the [.NET SDK](#dotNetSdk).
34+
35+
Only one execution of an indexer can run at a time. If an indexer is already running when its next execution is scheduled, that execution is postponed until the next scheduled time.
36+
37+
Let’s consider an example to make this more concrete. Suppose we configure an indexer schedule with an **Interval** of hourly and a **Start Time** of June 1, 2019 at 8:00:00 AM UTC. Here’s what could happen when an indexer run takes longer than an hour:
38+
39+
* The first indexer execution starts at or around June 1, 2019 at 8:00 AM UTC. Assume this execution takes 20 minutes (or any time less than 1 hour).
40+
* The second execution starts at or around June 1, 2019 9:00 AM UTC. Suppose that this execution takes 70 minutes - more than an hour – and it will not complete until 10:10 AM UTC.
41+
* The third execution is scheduled to start at 10:00 AM UTC, but at that time the previous execution is still running. This scheduled execution is then skipped. The next execution of the indexer will not start until 11:00 AM UTC.
42+
43+
<a name="portal"></a>
44+
45+
## Define a schedule in the portal
46+
47+
The Import Data wizard in the portal lets you define the schedule for an indexer at creation time. The default Schedule setting is **Hourly**, which means the indexer runs once after it is created, and runs again every hour afterwards.
48+
49+
You can change the Schedule setting to **Once** if you don't want the indexer to run again automatically, or to **Daily** to run once per day. Set it to **Custom** if you want to specify a different interval or a specific future Start Time.
50+
51+
When you set the schedule to **Custom**, fields appear to let you specify the **Interval** and the **Start Time (UTC)**. The shortest time interval allowed is 5 minutes, and the longest is 1440 minutes (24 hours).
52+
53+
![Setting indexer schedule in Import Data wizard](media/search-howto-schedule-indexers/schedule-import-data.png "Setting indexer schedule in Import Data wizard")
54+
55+
After an indexer has been created, you can change the schedule settings using the indexer's Edit panel. The Schedule fields are the same as in the Import Data wizard.
56+
57+
![Setting the schedule in indexer Edit panel](media/search-howto-schedule-indexers/schedule-edit.png "Setting the schedule in indexer Edit panel")
58+
59+
<a name="restApi"></a>
60+
61+
## Define a schedule using the REST API
62+
63+
You can define the schedule for an indexer using the REST API. To do this, include the **schedule** property when creating or updating the indexer. The example below shows a PUT request to update an existing indexer:
64+
65+
PUT https://myservice.search.windows.net/indexers/myindexer?api-version=2019-05-06
66+
Content-Type: application/json
67+
api-key: admin-key
68+
69+
{
70+
"dataSourceName" : "myazuresqldatasource",
71+
"targetIndexName" : "target index name",
72+
"schedule" : { "interval" : "PT10M", "startTime" : "2015-01-01T00:00:00Z" }
73+
}
74+
75+
The **interval** parameter is required. The interval refers to the time between the start of two consecutive indexer executions. The smallest allowed interval is 5 minutes; the longest is one day. It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an [ISO 8601 duration](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration) value). The pattern for this is: `P(nD)(T(nH)(nM))`. Examples: `PT15M` for every 15 minutes, `PT2H` for every 2 hours.
76+
77+
The optional **startTime** indicates when scheduled executions should begin. If it is omitted, the current UTC time is used. This time can be in the past, in which case the first execution is scheduled as if the indexer has been running continuously since the original **startTime**.
78+
79+
You can also run an indexer on demand at any time using the Run Indexer call. For more information about running indexers and setting indexer schedules, see [Run Indexer](https://docs.microsoft.com/rest/api/searchservice/run-indexer), [Get Indexer](https://docs.microsoft.com/rest/api/searchservice/get-indexer), and [Update Indexer](https://docs.microsoft.com/rest/api/searchservice/update-indexer) in the REST API Reference.
80+
81+
<a name="dotNetSdk"></a>
82+
83+
## Define a schedule using the .NET SDK
84+
85+
You can define the schedule for an indexer using the Azure Search .NET SDK. To do this, include the **schedule** property when creating or updating an Indexer.
86+
87+
The following C# example creates an indexer, using a predefined data source and index, and sets its schedule to run once every day starting 30 minutes from now:
88+
89+
```
90+
Indexer indexer = new Indexer(
91+
name: "azure-sql-indexer",
92+
dataSourceName: dataSource.Name,
93+
targetIndexName: index.Name,
94+
schedule: new IndexingSchedule(
95+
TimeSpan.FromDays(1),
96+
new DateTimeOffset(DateTime.UtcNow.AddMinutes(30))
97+
)
98+
);
99+
await searchService.Indexers.CreateOrUpdateAsync(indexer);
100+
```
101+
If the **schedule** parameter is omitted, the indexer will only run once immediately after it is created.
102+
103+
The **startTime** parameter can be set to a time in the past. In that case, the first execution is scheduled as if the indexer has been running continuously since the given **startTime**.
104+
105+
The schedule is defined using the [IndexingSchedule](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.models.indexingschedule?view=azure-dotnet) class. The **IndexingSchedule** constructor requires an **interval** parameter specified using a **TimeSpan** object. The smallest interval value allowed is 5 minutes, and the largest is 24 hours. The second **startTime** parameter, specified as a **DateTimeOffset** object, is optional.
106+
107+
The .NET SDK lets you control indexer operations using the [SearchServiceClient](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.searchserviceclient) class and its [Indexers](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.searchserviceclient.indexers) property, which implements methods from the **IIndexersOperations** interface.
108+
109+
You can run an indexer on demand at any time using one of the [Run](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.indexersoperationsextensions.run), [RunAsync](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.indexersoperationsextensions.runasync), or [RunWithHttpMessagesAsync](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.iindexersoperations.runwithhttpmessagesasync) methods.
110+
111+
For more information about creating, updating, and running indexers, see [IIindexersOperations](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.iindexersoperations?view=azure-dotnet).

0 commit comments

Comments
 (0)