You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this article, you'll learn how to import data into the Azure Machine Learning platform from external sources. A successful import automatically creates and registers an Azure Machine Learning data asset with the name provided during the import. An Azure Machine Learning data asset resembles a web browser bookmark (favorites). You don't need to remember long storage paths (URIs) that point to your most-frequently used data. Instead, you can create a data asset, and then access that asset with a friendly name.
19
+
In this article, you learn how to import data into the Azure Machine Learning platform from external sources. A successful data import automatically creates and registers an Azure Machine Learning data asset with the name provided during that import. An Azure Machine Learning data asset resembles a web browser bookmark (favorites). You don't need to remember long storage paths (URIs) that point to your most-frequently used data. Instead, you can create a data asset, and then access that asset with a friendly name.
20
20
21
21
A data import creates a cache of the source data, along with metadata, for faster and reliable data access in Azure Machine Learning training jobs. The data cache avoids network and connection constraints. The cached data is versioned to support reproducibility. This provides versioning capabilities for data imported from SQL Server sources. Additionally, the cached data provides data lineage for auditing tasks. A data import uses ADF (Azure Data Factory pipelines) behind the scenes, which means that users can avoid complex interactions with ADF. Behind the scenes, Azure Machine Learning also handles management of ADF compute resource pool size, compute resource provisioning, and tear-down, to optimize data transfer by determining proper parallelization.
22
22
23
-
The transferred data is partitioned and securely stored in Azure storage, as parquet files. This enables faster processing during training. ADF compute costs only involve the time used for data transfers. Storage costs only involve the time needed to cache the data, because cached data is a copy of the data imported from an external source. Azure storage hosts that external source.
23
+
The transferred data is partitioned and securely stored as parquet files in Azure storage. This enables faster processing during training. ADF compute costs only involve the time used for data transfers. Storage costs only involve the time needed to cache the data, because cached data is a copy of the data imported from an external source. Azure storage hosts that external source.
24
24
25
25
The caching feature involves upfront compute and storage costs. However, it pays for itself, and can save money, because it reduces recurring training compute costs, compared to direct connections to external source data during training. It caches data as parquet files, which makes job training faster and more reliable against connection timeouts for larger data sets. This leads to fewer reruns, and fewer training failures.
1. Navigate to the [Azure Machine Learning studio](https://ml.azure.com).
134
134
135
-
1. Under **Assets** in the left navigation, select **Data**. Next, select the **Data Import** tab. Then select Create as shown in this screenshot:
135
+
1. Under **Assets** in the left navigation, select **Data**. Next, select the **Data Import** tab. Then select Create, as shown in this screenshot:
136
136
137
137
:::image type="content" source="media/how-to-import-data-assets/create-new-data-import.png" lightbox="media/how-to-import-data-assets/create-new-data-import.png" alt-text="Screenshot showing creation of a new data import in Azure Machine Learning studio UI.":::
:::image type="content" source="media/how-to-import-data-assets/choose-snowflake-datastore-to-output.png" lightbox="media/how-to-import-data-assets/choose-snowflake-datastore-to-output.png" alt-text="Screenshot that shows details of the data source to output.":::
154
154
155
155
> [!NOTE]
156
-
> To choose your own datastore, select **Other datastores**. In this case, you must select the path for the location of the data cache.
156
+
> To choose your own datastore, select **Other datastores**. In that case, you must select the path for the location of the data cache.
157
157
158
158
1. You can add a schedule. Select **Add schedule** as shown in this screenshot:
159
159
160
160
:::image type="content" source="media/how-to-import-data-assets/create-data-import-add-schedule.png" lightbox="media/how-to-import-data-assets/create-data-import-add-schedule.png" alt-text="Screenshot that shows the selection of the Add schedule button.":::
161
161
162
-
A new panel opens, where you can define a **Recurrence** schedule, or a **Cron** schedule. This screenshot shows the panel for a **Recurrence** schedule:
162
+
A new panel opens, where you can define either a **Recurrence** schedule, or a **Cron** schedule. This screenshot shows the panel for a **Recurrence** schedule:
163
163
164
164
:::image type="content" source="media/how-to-import-data-assets/create-data-import-recurrence-schedule.png" lightbox="media/how-to-import-data-assets/create-data-import-recurrence-schedule.png" alt-text="A screenshot that shows selection of the Add recurrence schedule button.":::
|`MONTHS`| - | Not supported. The value is ignored and treated as `*`. |
206
206
|`DAYS-OF-WEEK`| 0-6 | Zero (0) means Sunday. Names of days also accepted. |
207
207
208
-
-To learn more about crontab expressions, see[Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
208
+
-For more information about crontab expressions, visit the[Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
209
209
210
210
> [!IMPORTANT]
211
211
> `DAYS` and `MONTH` are not supported. If you pass one of these values, it will be ignored and treated as `*`.
> An Amazon S3 data resource can serve as an external file system resource.
230
230
231
-
The `connection` that handles the data import action determines the details of the external data source. The connection defines an Amazon S3 bucket as the target. The connection expects a valid `path` value. An asset value imported from an external file system source has a `type` of `uri_folder`.
231
+
The `connection` that handles the data import action determines the aspects of the external data source. The connection defines an Amazon S3 bucket as the target. The connection expects a valid `path` value. An asset value imported from an external file system source has a `type` of `uri_folder`.
232
232
233
233
The next code sample imports data from an Amazon S3 resource.
|`MONTHS`| - | Not supported. The value is ignored and treated as `*`. |
361
361
|`DAYS-OF-WEEK`| 0-6 | Zero (0) means Sunday. Names of days also accepted. |
362
362
363
-
-To learn more about crontab expressions, see[Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
363
+
-For more information about crontab expressions, visit the[Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
364
364
365
365
> [!IMPORTANT]
366
366
> `DAYS` and `MONTH` are not supported. If you pass one of these values, it will be ignored and treated as `*`.
0 commit comments