Skip to content

Commit d717c13

Browse files
Merge pull request #282100 from fbsolo-ms1/UPDATE-how-to-schedule-data-import
Freshness update how-to-schedule-data-import.md . . .
2 parents 47a2078 + 5578c93 commit d717c13

File tree

1 file changed

+37
-32
lines changed

1 file changed

+37
-32
lines changed

articles/machine-learning/how-to-schedule-data-import.md

Lines changed: 37 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,19 @@ ms.topic: how-to
99
ms.author: franksolomon
1010
author: fbsolo-ms1
1111
ms.reviewer: ambadal
12-
ms.date: 06/19/2023
12+
ms.date: 07/28/2024
1313
ms.custom: data4ml, devx-track-azurecli
1414
---
1515

1616
# Schedule data import jobs (preview)
1717

1818
[!INCLUDE [dev v2](includes/machine-learning-dev-v2.md)]
1919

20-
In this article, you'll learn how to programmatically schedule data imports and use the schedule UI to do the same. You can create a schedule based on elapsed time. Time-based schedules can be used to take care of routine tasks, such as importing the data regularly to keep them up-to-date. After learning how to create schedules, you'll learn how to retrieve, update and deactivate them via CLI, SDK, and studio UI.
20+
In this article, you'll learn how to programmatically schedule data imports, using the schedule UI to do it. You can create a schedule based on elapsed time. Time-based schedules can handle routine tasks - for example, regular data imports to keep them up-to-date. After learning how to create schedules, you'll learn how to retrieve, update and deactivate them via CLI, SDK, and studio UI resources.
2121

2222
## Prerequisites
2323

24-
- You must have an Azure subscription to use Azure Machine Learning. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/) today.
24+
- You need an Azure subscription to use Azure Machine Learning. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/) today.
2525

2626
# [Azure CLI](#tab/cli)
2727

@@ -43,7 +43,7 @@ In this article, you'll learn how to programmatically schedule data imports and
4343

4444
## Schedule data import
4545

46-
To import data on a recurring basis, you must create a schedule. A `Schedule` associates a data import action, and a trigger. The trigger can either be `cron` that use cron expression to describe the wait between runs or `recurrence` that specify using what frequency to trigger job. In each case, you must first define an import data definition. An existing data import, or a data import that is defined inline, works for this. Refer to [Create a data import in CLI, SDK and UI](how-to-import-data-assets.md).
46+
To import data on a recurring basis, you must create a schedule. A `Schedule` associates a data import action with a trigger. The trigger can either be `cron`, which uses a cron expression to describe the delay between runs, or a `recurrence`, which specifies the frequency to trigger a job. In each case, you must first build an import data definition. An existing data import, or a data import that is defined inline, works for this. For more information, visit [Create a data import in CLI, SDK and UI](how-to-import-data-assets.md).
4747

4848
## Create a schedule
4949

@@ -101,9 +101,9 @@ import_data:
101101

102102
```
103103

104-
`trigger` contains the following properties:
104+
A `trigger` contains these properties:
105105

106-
- **(Required)** `type` specifies the schedule type, either `recurrence` or `cron`. See the following section for more details.
106+
- **(Required)** `type` specifies the schedule type, either `recurrence` or `cron`. The following section has more information.
107107

108108
Next, run this command in the CLI:
109109

@@ -155,19 +155,19 @@ ml_client.schedules.begin_create_or_update(import_schedule).result()
155155
```
156156
`RecurrenceTrigger` contains following properties:
157157

158-
- **(Required)** To provide better coding experience, we use `RecurrenceTrigger` for recurrence schedule.
158+
- **(Required)** For a better coding experience, use `RecurrenceTrigger` for the recurrence schedule.
159159

160160
# [Studio](#tab/azure-studio)
161161

162-
When you have a data import with satisfactory performance and outputs, you can set up a schedule to automatically trigger this import.
162+
When your data import has satisfactory performance and outputs, you can set up a schedule to automatically trigger that import.
163163

164164
1. Navigate to [Azure Machine Learning studio](https://ml.azure.com)
165165

166-
1. Under **Assets** in the left navigation, select **Data**. On the **Data import** tab, select the imported data asset to which you want to attach a schedule. The **Import jobs history** page should appear, as shown in this screenshot:
166+
1. Under **Assets** in the left navigation, select **Data**. At the **Data import** tab, select the imported data asset to which you want to attach a schedule. The **Import jobs history** page should appear, as shown in this screenshot:
167167

168168
:::image type="content" source="./media/how-to-schedule-data-import/data-import-list.png" lightbox="./media/how-to-schedule-data-import/data-import-list.png" alt-text="Screenshot highlighting the imported data asset name in the Data imports tab.":::
169169

170-
1. At the **Import jobs history** page, select the latest **Import job name** link, to open the pipelines job details page as shown in this screenshot:
170+
1. At the **Import jobs history** page, select the latest **Import job name** hyperlink URL, to open the pipelines job details page as shown in this screenshot:
171171

172172
:::image type="content" source="./media/how-to-schedule-data-import/data-import-history.png" lightbox="./media/how-to-schedule-data-import/data-import-history.png" alt-text="Screenshot highlighting the imported data asset guid in the Import jobs history tab.":::
173173

@@ -181,7 +181,7 @@ When you have a data import with satisfactory performance and outputs, you can s
181181

182182
- **Name**: the unique identifier of the schedule within the workspace.
183183
- **Description**: the schedule description.
184-
- **Trigger**: the recurrence pattern of the schedule, which includes the following properties.
184+
- **Trigger**: the recurrence pattern of the schedule, which includes these properties:
185185
- **Time zone**: the trigger time calculation is based on this time zone; (UTC) Coordinated Universal Time by default.
186186
- **Recurrence** or **Cron expression**: select recurrence to specify the recurring pattern. Under **Recurrence**, you can specify the recurrence frequency - by minutes, hours, days, weeks, or months.
187187
- **Start**: the schedule first becomes active on this date. By default, the creation date of this schedule.
@@ -194,7 +194,12 @@ When you have a data import with satisfactory performance and outputs, you can s
194194
> [!NOTE]
195195
> These properties apply to CLI and SDK:
196196
197-
- **(Required)** `frequency` specifies the unit of time that describes how often the schedule fires. Can have values of `minute`, `hour`, `day`, `week`, or `month`.
197+
- **(Required)** `frequency` specifies the unit of time that describes how often the schedule fires. Can have values
198+
- `minute`
199+
- `hour`
200+
- `day`
201+
- `week`
202+
- `month`
198203

199204
- **(Required)** `interval` specifies how often the schedule fires based on the frequency, which is the number of time units to wait until the schedule fires again.
200205

@@ -204,13 +209,13 @@ When you have a data import with satisfactory performance and outputs, you can s
204209
- `hours` should be an integer or a list, ranging between 0 and 23.
205210
- `minutes` should be an integer or a list, ranging between 0 and 59.
206211
- `weekdays` a string or list ranging from `monday` to `sunday`.
207-
- If `schedule` is omitted, the job(s) triggers according to the logic of `start_time`, `frequency` and `interval`.
212+
- If `schedule` is omitted, the job(s) triggers fire according to the logic of `start_time`, `frequency` and `interval`.
208213

209214
- (Optional) `start_time` describes the start date and time, with a timezone. If `start_time` is omitted, start_time equals the job creation time. For a start time in the past, the first job runs at the next calculated run time.
210215

211216
- (Optional) `end_time` describes the end date and time with a timezone. If `end_time` is omitted, the schedule continues to trigger jobs until the schedule is manually disabled.
212217

213-
- (Optional) `time_zone` specifies the time zone of the recurrence. If omitted, the default timezone is UTC. To learn more about timezone values, see [appendix for timezone values](reference-yaml-schedule.md#appendix).
218+
- (Optional) `time_zone` specifies the time zone of the recurrence. If omitted, the default timezone is UTC. For more information about timezone values, visit [appendix for timezone values](reference-yaml-schedule.md#appendix).
214219

215220
### Create a time-based schedule with cron expression
216221

@@ -259,9 +264,9 @@ import_data:
259264
connection: azureml:my_snowflake_connection
260265
```
261266
262-
The `trigger` section defines the schedule details and contains following properties:
267+
The `trigger` section defines the schedule details and contains these properties:
263268

264-
- **(Required)** `type` specifies the schedule type is `cron`.
269+
- **(Required)** `type` specifies the `cron` schedule type.
265270

266271
```cli
267272
> az ml schedule create -f <file-name>.yml
@@ -301,23 +306,23 @@ ml_client.schedules.begin_create_or_update(import_schedule).result()
301306
302307
```
303308

304-
The `CronTrigger` section defines the schedule details and contains following properties:
309+
The `CronTrigger` section defines the schedule details and contains these properties:
305310

306-
- **(Required)** To provide better coding experience, we use `CronTrigger` for recurrence schedule.
311+
- **(Required)** For a better coding experience, use `CronTrigger` for the recurrence schedule.
307312

308313
The list continues here:
309314

310315
# [Studio](#tab/azure-studio)
311316

312-
When you have a data import with satisfactory performance and outputs, you can set up a schedule to automatically trigger this import.
317+
When your data import has satisfactory performance and outputs, you can set up a schedule to automatically trigger that import.
313318

314319
1. Navigate to [Azure Machine Learning studio](https://ml.azure.com)
315320

316321
1. Under **Assets** in the left navigation, select **Data**. On the **Data import** tab, select the imported data asset to which you want to attach a schedule. The **Import jobs history** page should appear, as shown in this screenshot:
317322

318323
:::image type="content" source="./media/how-to-schedule-data-import/data-import-list.png" lightbox="./media/how-to-schedule-data-import/data-import-list.png" alt-text="Screenshot highlighting the imported data asset name in the Data imports tab.":::
319324

320-
1. At the **Import jobs history** page, select the latest **Import job name** link, to open the pipelines job details page as shown in this screenshot:
325+
1. At the **Import jobs history** page, select the latest **Import job name** hyperlink URL, to open the pipelines job details page as shown in this screenshot:
321326

322327
:::image type="content" source="./media/how-to-schedule-data-import/data-import-history.png" lightbox="./media/how-to-schedule-data-import/data-import-history.png" alt-text="Screenshot highlighting the imported data asset guid in the Import jobs history tab.":::
323328

@@ -331,9 +336,9 @@ When you have a data import with satisfactory performance and outputs, you can s
331336

332337
- **Name**: the unique identifier of the schedule within the workspace.
333338
- **Description**: the schedule description.
334-
- **Trigger**: the recurrence pattern of the schedule, which includes the following properties.
339+
- **Trigger**: the recurrence pattern of the schedule, which includes these properties:
335340
- **Time zone**: the trigger time calculation is based on this time zone; (UTC) Coordinated Universal Time by default.
336-
- **Recurrence** or **Cron expression**: select recurrence to specify the recurring pattern. **Cron expression** allows you to specify more flexible and customized recurrence pattern.
341+
- **Recurrence** or **Cron expression**: select recurrence to specify the recurring pattern. With **Cron expression**, you can specify a more flexible and customized recurrence pattern.
337342
- **Start**: the schedule first becomes active on this date. By default, the creation date of this schedule.
338343
- **End**: the schedule will become inactive after this date. By default, it's NONE, which means that the schedule remains active until you manually disable it.
339344
- **Tags**: the selected schedule tags.
@@ -348,7 +353,7 @@ When you have a data import with satisfactory performance and outputs, you can s
348353

349354
- A single wildcard (`*`), which covers all values for the field. A `*`, in days, means all days of a month (which varies with month and year).
350355
- The `expression: "15 16 * * 1"` in the sample above means the 16:15PM on every Monday.
351-
- The next table lists the valid values for each field:
356+
- This table lists the valid values for each field:
352357
353358
| Field | Range | Comment |
354359
|----------------|----------|-----------------------------------------------------------|
@@ -358,7 +363,7 @@ When you have a data import with satisfactory performance and outputs, you can s
358363
| `MONTHS` | - | Not supported. The value is ignored and treated as `*`. |
359364
| `DAYS-OF-WEEK` | 0-6 | Zero (0) means Sunday. Names of days also accepted. |
360365

361-
- To learn more about crontab expressions, see [Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
366+
- For more information about crontab expressions, visit the [Crontab Expression wiki resource on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
362367

363368
> [!IMPORTANT]
364369
> `DAYS` and `MONTH` are not supported. If you pass one of these values, it will be ignored and treated as `*`.
@@ -367,7 +372,7 @@ When you have a data import with satisfactory performance and outputs, you can s
367372

368373
- (Optional) `end_time` describes the end date, and time with a timezone. If `end_time` is omitted, the schedule continues to trigger jobs until the schedule is manually disabled.
369374

370-
- (Optional) `time_zone`specifies the time zone of the expression. If omitted, the timezone is UTC by default. See [appendix for timezone values](reference-yaml-schedule.md#appendix).
375+
- (Optional) `time_zone`specifies the time zone of the expression. If `time_zone` is omitted, the timezone is UTC by default. For more information about timezone values, visit [appendix for timezone values](reference-yaml-schedule.md#appendix).
371376

372377
Limitations:
373378

@@ -423,7 +428,7 @@ You can select a schedule name to show the schedule details page. The schedule d
423428

424429
- **Overview**: basic information for the specified schedule.
425430

426-
:::image type="content" source="./media/how-to-schedule-data-import/schedule-detail-overview.png" alt-text="Screenshot of the overview tab in the schedule details page." :::
431+
:::image type="content" source="./media/how-to-schedule-data-import/schedule-detail-overview.png" alt-text="Screenshot of the overview tab in the schedule details page.":::
427432

428433
- **Job definition**: defines the job that the specified schedule triggers, as shown in this screenshot:
429434

@@ -442,7 +447,7 @@ az ml schedule update -n simple_cron_data_import_schedule --set description="ne
442447
```
443448

444449
> [!NOTE]
445-
> To update more than just tags/description, it is recommended to use `az ml schedule create --file update_schedule.yml`
450+
> To update more than just tags/description, we recommend use of `az ml schedule create --file update_schedule.yml`
446451
447452
# [Python SDK](#tab/python)
448453

@@ -467,7 +472,7 @@ To change the import frequency, or to create a new association for the data impo
467472
468473
1. Navigate to [Azure Machine Learning studio](https://ml.azure.com)
469474

470-
1. Under **Assets** in the left navigation, select **Data**. On the **Data import** tab, select the imported data asset to which you want to attach a schedule. Then, the **Import jobs history** page opens, as shown in this screenshot:
475+
1. Under **Assets** in the left navigation, select **Data**. On the **Data import** tab, select the imported data asset to which you want to attach a schedule. The **Import jobs history** page should appear, as shown in this screenshot:
471476

472477
:::image type="content" source="./media/how-to-schedule-data-import/data-import-list.png" alt-text="Screenshot highlighting the imported data asset name in the Data imports tab.":::
473478

@@ -484,7 +489,7 @@ To change the import frequency, or to create a new association for the data impo
484489
:::image type="content" source="./media/how-to-schedule-data-import/update-select-schedule.png" alt-text="Screenshot of update select schedule showing the select schedule tab." :::
485490

486491
> [!IMPORTANT]
487-
> Make sure to select the correct schedule to update. Once you finish the update, the schedule will trigger different data imports.
492+
> Make sure you select the correct schedule to update. Once you finish the update, the schedule will trigger different data imports.
488493
489494
1. You can also modify the source, query and change the destination path, for future data imports that the schedule triggers.
490495

@@ -550,14 +555,14 @@ print(job_schedule)
550555

551556
# [Studio](#tab/azure-studio)
552557

553-
On the schedule details page, you can enable the current schedule. You can also enable schedules at the **All schedules** tab.
558+
At the schedule details page, you can enable the current schedule. You can also enable schedules at the **All schedules** tab.
554559

555560
---
556561

557562
## Delete a schedule
558563

559564
> [!IMPORTANT]
560-
> A schedule must be disabled before deletion. Deletion is an unrecoverable action. After a schedule is deleted, you can never access or recover it.
565+
> A schedule must be disabled before deletion. Deletion is a permanent, unrecoverable action. After a schedule is deleted, you can never access or recover it.
561566
562567
# [Azure CLI](#tab/cli)
563568

@@ -587,7 +592,7 @@ You can delete a schedule from the schedule details page or the all schedules ta
587592

588593
Schedules are generally used for production. To prevent problems, workspace admins may want to restrict schedule creation and management permissions within a workspace.
589594

590-
There are currently three action rules related to schedules, and you can configure them in Azure portal. See [how to manage access to an Azure Machine Learning workspace.](how-to-assign-roles.md#create-custom-role) to learn more.
595+
There are currently three action rules related to schedules, and you can configure them in the Azure portal. For more information, visit [how to manage access to an Azure Machine Learning workspace.](how-to-assign-roles.md#create-custom-role).
591596

592597
| Action | Description | Rule |
593598
|--------|----------------------------------------------------------------------------|---------------------------------------------------------------|

0 commit comments

Comments
 (0)