Skip to content

Commit 7e26aa1

Browse files
committed
Bug fixes and requested edits.
1 parent c0e7301 commit 7e26aa1

File tree

3 files changed

+29
-17
lines changed

3 files changed

+29
-17
lines changed

articles/machine-learning/how-to-connection.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,12 @@ In this article, learn how to connect to data sources located outside of Azure,
3333
- An Azure Machine Learning workspace.
3434

3535
> [!NOTE]
36-
> An Azure Machine Learning connection stores the credentials passed during connection creation in the Workspace Azure Key Vault. A connection references the credentials from that location for further use. The YAML cna pass the credentials. A CLI command or SDK can override them. We recommend that you **avoid** credential storage in YAML files.
36+
> An Azure Machine Learning connection securely stores the credentials passed during connection creation in the Workspace Azure Key Vault. A connection references the credentials from the key vault storage location for further use. You won't need to directly deal with the credentials after they are stored in the key vault. You have the option to store the credentials in the YAML file. A CLI command or SDK can override them. We recommend that you **avoid** credential storage in a YAML file, because a security breach could lead to a credential leak.
3737
3838
## Create a Snowflake DB connection
3939

4040
# [CLI: Username/password](#tab/cli-username-password)
41-
This YAML script creates a Snowflake DB connection. Be sure to update the appropriate values:
41+
This YAML file creates a Snowflake DB connection. Be sure to update the appropriate values:
4242

4343
```yaml
4444
# my_snowflakedb_connection.yaml
@@ -56,7 +56,7 @@ credentials:
5656
5757
Create the Azure Machine Learning datastore in the CLI:
5858
59-
### Option 1: Use the username and password in a YAML script
59+
### Option 1: Use the username and password in YAML file
6060
6161
```azurecli
6262
az ml connection create --file my_snowflakedb_connection.yaml
@@ -70,7 +70,7 @@ az ml connection create --file my_snowflakedb_connection.yaml --set credentials.
7070

7171
# [Python SDK: username/ password](#tab/sdk-username-password)
7272

73-
### Option 1: Load the connection in a YAML script
73+
### Option 1: Load connection from YAML file
7474

7575
```python
7676
from azure.ai.ml import MLClient, load_workspace_connection
@@ -83,7 +83,6 @@ wps_connection.credentials.password="XXXXXXXX"
8383
ml_client.connections.create_or_update(workspace_connection=wps_connection)
8484

8585
```
86-
---
8786

8887
### Option 2: Use WorkspaceConnection() in a Python script
8988

@@ -104,6 +103,8 @@ ml_client.connections.create_or_update(workspace_connection=wps_connection)
104103

105104
```
106105

106+
---
107+
107108
## Create an Azure SQL DB connection
108109

109110
# [CLI: Username/password](#tab/cli-sql-username-password)
@@ -125,23 +126,23 @@ credentials:
125126
password: <password> # add the sql database password here or leave this blank and type in CLI command line
126127
```
127128
128-
Create the Azure Machine Learning datastore in the CLI:
129+
Create the Azure Machine Learning connection in the CLI:
129130
130-
### Option 1: Use the username/ password in a YAML script
131+
### Option 1: Use the username / password from YAML file
131132
132133
```azurecli
133134
az ml connection create --file my_sqldb_connection.yaml
134135
```
135136

136-
### Option 2: Override the username and password in the YAML file
137+
### Option 2: Override the username and password in YAML file
137138

138139
```azurecli
139140
az ml connection create --file my_sqldb_connection.yaml --set credentials.username="XXXXX" credentials.password="XXXXX"
140141
```
141142

142143
# [Python SDK: username/ password](#tab/sdk-sql-username-password)
143144

144-
### Option 1: Load the connection in a YAML script
145+
### Option 1: Load connection from YAML file
145146

146147
```python
147148
from azure.ai.ml import MLClient, load_workspace_connection
@@ -154,7 +155,6 @@ wps_connection.credentials.password="XXXXXxXXX"
154155
ml_client.connections.create_or_update(workspace_connection=wps_connection)
155156

156157
```
157-
---
158158

159159
### Option 2: Using WorkspaceConnection()
160160

@@ -175,6 +175,8 @@ ml_client.connections.create_or_update(workspace_connection=wps_connection)
175175

176176
```
177177

178+
---
179+
178180
## Create Amazon S3 connection
179181

180182
# [CLI: Access key](#tab/cli-s3-access-key)
@@ -195,15 +197,15 @@ credentials:
195197
secret_access_key: XxXxXxXXXXXXXxXxXxxXxxXXXXXXXXxXxxXXxXXXXXXXxxxXxXXxXXXXXxXXxXXXxXxXxxxXXxXXxXXXXXxXxxXX # add access key secret
196198
```
197199
198-
Create the Azure Machine Learning datastore in the CLI:
200+
Create the Azure Machine Learning connection in the CLI:
199201
200202
```azurecli
201203
az ml connection create --file my_s3_connection.yaml
202204
```
203205

204206
# [Python SDK: Access key](#tab/sdk-s3-access-key)
205207

206-
### Option 1: Load the connection in a YAML script
208+
### Option 1: Load connection from YAML file
207209

208210
```python
209211
from azure.ai.ml import MLClient, load_workspace_connection

articles/machine-learning/how-to-import-data-assets.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,11 @@ ms.custom: data4ml
2121
2222
In this article, learn how to import data into the Azure Machine Learning platform from external sources. A successful import automatically creates and registers an Azure Machine Learning data asset with the name provided during the import. An Azure Machine Learning data asset resembles a web browser bookmark (favorites). You don't need to remember long storage paths (URIs) that point to your most-frequently used data. Instead, you can create a data asset, and then access that asset with a friendly name.
2323

24-
A data import creates a *cache* of the source data, along with metadata, for faster, reliable data access in Azure Machine Learning training jobs. The data import avoids network and connection constraints. The cached data is versioned to support reproducibility, and to provide data lineage, even for data imported from SQL Server sources. A data import uses ADF (Azure Data Factory pipelines) behind the scenes, and users can avoid ADF interactions as a result. To optimize data transfer parallelization, Azure Machine Learning handles ADF compute resource provisioning and tear-down.
24+
A data import creates a cache of the source data, along with metadata, for faster and reliable data access in Azure Machine Learning training jobs. The data cache avoids network and connection constraints. The cached data is versioned to support reproducibility (which provides versioning capabilities for data imported from SQL Server sources). Additionally, the cached data provides data lineage for auditability. A data import uses ADF (Azure Data Factory pipelines) behind the scenes, which means that users can avoid complex interactions with ADF. Behind the scenes, Azure Machine Learning also handles management of ADF compute resource pool size, compute resource provisioning, and tear-down to optimize data transfer by determining proper parallelization.
2525

26-
The transferred data is partitioned and securely stored in Azure storage, in parquet format. ADF compute and storage costs only involve the time that the data cached, because the cache is a copy of the data hosted in Azure storage. ADF compute facilitated the data transfer.
26+
The transferred data is partitioned and securely stored as parquet files in Azure storage. This enables faster processing during training. ADF compute costs only involve the time used for data transfers. Storage costs only involve the time needed to cache the data, because cached data is a copy of the data imported from an external source. That external source is hosted in Azure storage.
2727

28-
The cached parquet-format data is readily available for Azure Machine Learning training job consumption, in a fast and efficient manner. This increases training run speeds, and it helps protect against connection timeouts for large data set training. It reduces recurring training compute costs, in comparison to direct connections to external source data while training.
28+
The caching feature involves upfront compute and storage costs. However, it pays for itself, and can save money, because it reduces recurring training compute costs compared to direct connections to external source data during training. It caches data as parquet files, which makes job training faster and more reliable against connection timeouts for larger data sets. This leads to fewer reruns, and fewer training failures.
2929

3030
You can now import data from Snowflake, Amazon S3 and Azure SQL.
3131

@@ -43,7 +43,7 @@ To create and work with data assets, you need:
4343

4444
## Importing from external database sources / import from external sources to create a meltable data asset
4545

46-
>__NOTE:__ The external databases can have Snowflake, Azure SQL, etc. formats.
46+
>NOTE: The external databases can have Snowflake, Azure SQL, etc. formats.
4747
4848
The following code samples can import data from external databases. The `connection` that handles the import action determines the external database data source metadata. In this sample, the code imports data from a Snowflake resource. The connection points to a Snowflake source. With a little modification, the connection can point to an Azure SQL database source and an Azure SQL database source. The imported asset `type` from an external database source is `mltable`.
4949

@@ -160,7 +160,7 @@ ml_client.data.import_data(data_import=data_import)
160160

161161
## Check the import status of external data sources
162162

163-
The data import action is an asynchronous action. It can take a long time. After submission of an import data action via the CLI or SDK, the Azure Machine Learning service might need several minutes to connect to the external data source. Then the service would start the data import and handle data caching and registration. The time required for a data import also depends on the size of the source data set.
163+
The data import action is an asynchronous action. It can take a long time. After submission of an import data action via the CLI or SDK, the Azure Machine Learning service might need several minutes to connect to the external data source. Then the service would start the data import and handle data caching and registration. The time needed for a data import also depends on the size of the source data set.
164164

165165
The next example returns the status of the submitted data import activity. The command or method uses the "data asset" name as the input to determine the status of the data materialization.
166166

@@ -183,6 +183,8 @@ ml_client.data.show_materialization_status(name="<name>")
183183

184184
```
185185

186+
---
187+
186188
## Next steps
187189

188190
- [Read data in a job](how-to-read-write-data-v2.md#read-data-in-a-job)

articles/machine-learning/toc.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -630,6 +630,14 @@
630630
- name: Data administration and authentication
631631
displayName: Data administration and authentication
632632
href: how-to-administrate-data-authentication.md
633+
- name: Access data
634+
items:
635+
- name: Use Connections
636+
displayName: Use Connections
637+
href: how-to-connection.md
638+
- name: Import Data
639+
displayName: Import Data
640+
href: how-to-import-data-assets.md
633641
# v1
634642
- name: Access data
635643
items:

0 commit comments

Comments
 (0)