You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-connection.md
+14-12Lines changed: 14 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,12 +33,12 @@ In this article, learn how to connect to data sources located outside of Azure,
33
33
- An Azure Machine Learning workspace.
34
34
35
35
> [!NOTE]
36
-
> An Azure Machine Learning connection stores the credentials passed during connection creation in the Workspace Azure Key Vault. A connection references the credentials from that location for further use. The YAML cna pass the credentials. A CLI command or SDK can override them. We recommend that you **avoid** credential storage in YAML files.
36
+
> An Azure Machine Learning connection securely stores the credentials passed during connection creation in the Workspace Azure Key Vault. A connection references the credentials from the key vault storage location for further use. You won't need to directly deal with the credentials after they are stored in the key vault. You have the option to store the credentials in the YAML file. A CLI command or SDK can override them. We recommend that you **avoid** credential storage in a YAML file, because a security breach could lead to a credential leak.
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-import-data-assets.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,11 +21,11 @@ ms.custom: data4ml
21
21
22
22
In this article, learn how to import data into the Azure Machine Learning platform from external sources. A successful import automatically creates and registers an Azure Machine Learning data asset with the name provided during the import. An Azure Machine Learning data asset resembles a web browser bookmark (favorites). You don't need to remember long storage paths (URIs) that point to your most-frequently used data. Instead, you can create a data asset, and then access that asset with a friendly name.
23
23
24
-
A data import creates a *cache* of the source data, along with metadata, for faster, reliable data access in Azure Machine Learning training jobs. The data import avoids network and connection constraints. The cached data is versioned to support reproducibility, and to provide data lineage, even for data imported from SQL Server sources. A data import uses ADF (Azure Data Factory pipelines) behind the scenes, and users can avoid ADF interactions as a result. To optimize data transfer parallelization, Azure Machine Learning handles ADF compute resource provisioning and tear-down.
24
+
A data import creates a cache of the source data, along with metadata, for faster and reliable data access in Azure Machine Learning training jobs. The data cache avoids network and connection constraints. The cached data is versioned to support reproducibility (which provides versioning capabilities for data imported from SQL Server sources). Additionally, the cached data provides data lineage for auditability. A data import uses ADF (Azure Data Factory pipelines) behind the scenes, which means that users can avoid complex interactions with ADF. Behind the scenes, Azure Machine Learning also handles management of ADF compute resource pool size, compute resource provisioning, and tear-down to optimize data transfer by determining proper parallelization.
25
25
26
-
The transferred data is partitioned and securely stored in Azure storage, in parquet format. ADF compute and storage costs only involve the time that the data cached, because the cache is a copy of the data hosted in Azure storage. ADF compute facilitated the data transfer.
26
+
The transferred data is partitioned and securely stored as parquet files in Azure storage. This enables faster processing during training. ADF compute costs only involve the time used for data transfers. Storage costs only involve the time needed to cache the data, because cached data is a copy of the data imported from an external source. That external source is hosted in Azure storage.
27
27
28
-
The cached parquet-format data is readily available for Azure Machine Learning training job consumption, in a fast and efficient manner. This increases training run speeds, and it helps protect against connection timeouts for large data set training. It reduces recurring training compute costs, in comparison to direct connections to external source data while training.
28
+
The caching feature involves upfront compute and storage costs. However, it pays for itself, and can save money, because it reduces recurring training compute costs compared to direct connections to external source data during training. It caches data as parquet files, which makes job training faster and more reliable against connection timeouts for larger data sets. This leads to fewer reruns, and fewer training failures.
29
29
30
30
You can now import data from Snowflake, Amazon S3 and Azure SQL.
31
31
@@ -43,7 +43,7 @@ To create and work with data assets, you need:
43
43
44
44
## Importing from external database sources / import from external sources to create a meltable data asset
45
45
46
-
>__NOTE:__ The external databases can have Snowflake, Azure SQL, etc. formats.
46
+
>NOTE: The external databases can have Snowflake, Azure SQL, etc. formats.
47
47
48
48
The following code samples can import data from external databases. The `connection` that handles the import action determines the external database data source metadata. In this sample, the code imports data from a Snowflake resource. The connection points to a Snowflake source. With a little modification, the connection can point to an Azure SQL database source and an Azure SQL database source. The imported asset `type` from an external database source is `mltable`.
## Check the import status of external data sources
162
162
163
-
The data import action is an asynchronous action. It can take a long time. After submission of an import data action via the CLI or SDK, the Azure Machine Learning service might need several minutes to connect to the external data source. Then the service would start the data import and handle data caching and registration. The time required for a data import also depends on the size of the source data set.
163
+
The data import action is an asynchronous action. It can take a long time. After submission of an import data action via the CLI or SDK, the Azure Machine Learning service might need several minutes to connect to the external data source. Then the service would start the data import and handle data caching and registration. The time needed for a data import also depends on the size of the source data set.
164
164
165
165
The next example returns the status of the submitted data import activity. The command or method uses the "data asset" name as the input to determine the status of the data materialization.
0 commit comments