Skip to content

Commit 44271b1

Browse files
committed
Freshness update for how-to-import-data-assets.md . . .
1 parent 873ed1d commit 44271b1

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

articles/machine-learning/how-to-import-data-assets.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,18 @@ ms.topic: how-to
99
ms.author: ambadal
1010
author: AmarBadal
1111
ms.reviewer: franksolomon
12-
ms.date: 06/19/2023
12+
ms.date: 04/18/2024
1313
ms.custom: data4ml
1414
---
1515

1616
# Import data assets (preview)
1717
[!INCLUDE [dev v2](includes/machine-learning-dev-v2.md)]
1818

19-
In this article, you'll learn how to import data into the Azure Machine Learning platform from external sources. A successful import automatically creates and registers an Azure Machine Learning data asset with the name provided during the import. An Azure Machine Learning data asset resembles a web browser bookmark (favorites). You don't need to remember long storage paths (URIs) that point to your most-frequently used data. Instead, you can create a data asset, and then access that asset with a friendly name.
19+
In this article, you learn how to import data into the Azure Machine Learning platform from external sources. A successful data import automatically creates and registers an Azure Machine Learning data asset with the name provided during that import. An Azure Machine Learning data asset resembles a web browser bookmark (favorites). You don't need to remember long storage paths (URIs) that point to your most-frequently used data. Instead, you can create a data asset, and then access that asset with a friendly name.
2020

2121
A data import creates a cache of the source data, along with metadata, for faster and reliable data access in Azure Machine Learning training jobs. The data cache avoids network and connection constraints. The cached data is versioned to support reproducibility. This provides versioning capabilities for data imported from SQL Server sources. Additionally, the cached data provides data lineage for auditing tasks. A data import uses ADF (Azure Data Factory pipelines) behind the scenes, which means that users can avoid complex interactions with ADF. Behind the scenes, Azure Machine Learning also handles management of ADF compute resource pool size, compute resource provisioning, and tear-down, to optimize data transfer by determining proper parallelization.
2222

23-
The transferred data is partitioned and securely stored in Azure storage, as parquet files. This enables faster processing during training. ADF compute costs only involve the time used for data transfers. Storage costs only involve the time needed to cache the data, because cached data is a copy of the data imported from an external source. Azure storage hosts that external source.
23+
The transferred data is partitioned and securely stored as parquet files in Azure storage. This enables faster processing during training. ADF compute costs only involve the time used for data transfers. Storage costs only involve the time needed to cache the data, because cached data is a copy of the data imported from an external source. Azure storage hosts that external source.
2424

2525
The caching feature involves upfront compute and storage costs. However, it pays for itself, and can save money, because it reduces recurring training compute costs, compared to direct connections to external source data during training. It caches data as parquet files, which makes job training faster and more reliable against connection timeouts for larger data sets. This leads to fewer reruns, and fewer training failures.
2626

@@ -132,7 +132,7 @@ ml_client.data.import_data(data_import=data_import)
132132
133133
1. Navigate to the [Azure Machine Learning studio](https://ml.azure.com).
134134

135-
1. Under **Assets** in the left navigation, select **Data**. Next, select the **Data Import** tab. Then select Create as shown in this screenshot:
135+
1. Under **Assets** in the left navigation, select **Data**. Next, select the **Data Import** tab. Then select Create, as shown in this screenshot:
136136

137137
:::image type="content" source="media/how-to-import-data-assets/create-new-data-import.png" lightbox="media/how-to-import-data-assets/create-new-data-import.png" alt-text="Screenshot showing creation of a new data import in Azure Machine Learning studio UI.":::
138138

@@ -153,13 +153,13 @@ ml_client.data.import_data(data_import=data_import)
153153
:::image type="content" source="media/how-to-import-data-assets/choose-snowflake-datastore-to-output.png" lightbox="media/how-to-import-data-assets/choose-snowflake-datastore-to-output.png" alt-text="Screenshot that shows details of the data source to output.":::
154154

155155
> [!NOTE]
156-
> To choose your own datastore, select **Other datastores**. In this case, you must select the path for the location of the data cache.
156+
> To choose your own datastore, select **Other datastores**. In that case, you must select the path for the location of the data cache.
157157
158158
1. You can add a schedule. Select **Add schedule** as shown in this screenshot:
159159

160160
:::image type="content" source="media/how-to-import-data-assets/create-data-import-add-schedule.png" lightbox="media/how-to-import-data-assets/create-data-import-add-schedule.png" alt-text="Screenshot that shows the selection of the Add schedule button.":::
161161

162-
A new panel opens, where you can define a **Recurrence** schedule, or a **Cron** schedule. This screenshot shows the panel for a **Recurrence** schedule:
162+
A new panel opens, where you can define either a **Recurrence** schedule, or a **Cron** schedule. This screenshot shows the panel for a **Recurrence** schedule:
163163

164164
:::image type="content" source="media/how-to-import-data-assets/create-data-import-recurrence-schedule.png" lightbox="media/how-to-import-data-assets/create-data-import-recurrence-schedule.png" alt-text="A screenshot that shows selection of the Add recurrence schedule button.":::
165165

@@ -205,7 +205,7 @@ ml_client.data.import_data(data_import=data_import)
205205
| `MONTHS` | - | Not supported. The value is ignored and treated as `*`. |
206206
| `DAYS-OF-WEEK` | 0-6 | Zero (0) means Sunday. Names of days also accepted. |
207207

208-
- To learn more about crontab expressions, see [Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
208+
- For more information about crontab expressions, visit the [Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
209209

210210
> [!IMPORTANT]
211211
> `DAYS` and `MONTH` are not supported. If you pass one of these values, it will be ignored and treated as `*`.
@@ -228,7 +228,7 @@ ml_client.data.import_data(data_import=data_import)
228228
> [!NOTE]
229229
> An Amazon S3 data resource can serve as an external file system resource.
230230
231-
The `connection` that handles the data import action determines the details of the external data source. The connection defines an Amazon S3 bucket as the target. The connection expects a valid `path` value. An asset value imported from an external file system source has a `type` of `uri_folder`.
231+
The `connection` that handles the data import action determines the aspects of the external data source. The connection defines an Amazon S3 bucket as the target. The connection expects a valid `path` value. An asset value imported from an external file system source has a `type` of `uri_folder`.
232232

233233
The next code sample imports data from an Amazon S3 resource.
234234

@@ -360,7 +360,7 @@ ml_client.data.import_data(data_import=data_import)
360360
| `MONTHS` | - | Not supported. The value is ignored and treated as `*`. |
361361
| `DAYS-OF-WEEK` | 0-6 | Zero (0) means Sunday. Names of days also accepted. |
362362

363-
- To learn more about crontab expressions, see [Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
363+
- For more information about crontab expressions, visit the [Crontab Expression wiki on GitHub](https://github.com/atifaziz/NCrontab/wiki/Crontab-Expression).
364364

365365
> [!IMPORTANT]
366366
> `DAYS` and `MONTH` are not supported. If you pass one of these values, it will be ignored and treated as `*`.

0 commit comments

Comments
 (0)