You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#Customer intent: As an experienced Python developer, I need to read in my data to make it available to a remote compute to train my machine learning models.
Learn how to read and write data for your jobs with the Azure Machine Learning Python SDK v2 and the Azure Machine Learning CLI extension v2.
26
-
25
+
Learn how to read and write data for your jobs with the Azure Machine Learning Python SDK v2 and the Azure Machine Learning CLI extension v2.
26
+
27
27
## Prerequisites
28
28
29
29
- An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/).
@@ -34,48 +34,48 @@ Learn how to read and write data for your jobs with the Azure Machine Learning P
34
34
35
35
## Supported paths
36
36
37
-
When you provide a data input/output to a Job, you'll need to specify a `path` parameter that points to the data location. Below is a table that shows the different data locations supported in Azure Machine Learning and examples for the `path` parameter:
37
+
When you provide a data input/output to a Job, you must specify a `path` parameter that points to the data location. This table shows both the different data locations that Azure Machine Learning supports, and examples for the `path` parameter:
38
38
39
39
40
-
|Location | Examples | Notes|
41
-
|---------|---------|---------|
42
-
|A path on your local computer |`./home/username/data/my_data`||
43
-
|A path on a public http(s) server |`https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv`| https path pointing to a folder is not supported since https is not a filesystem. Please use other formats(wasbs/abfss/adl) instead for folder type of data.|
44
-
|A path on Azure Storage |`wasbs://<containername>@<accountname>.blob.core.windows.net/<path_to_data>/` <br> `abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>`<br> `adl://<accountname>.azuredatalakestore.net/<path_to_data>/`||
45
-
|A path on a Datastore |`azureml://datastores/<data_store_name>/paths/<path>`||
46
-
|A path to a Data Asset |`azureml:<my_data>:<version>`||
40
+
|Location | Examples |
41
+
|---------|---------|
42
+
|A path on your local computer |`./home/username/data/my_data`|
43
+
|A path on a public http(s) server |`https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv`|
44
+
|A path on Azure Storage |`https://<account_name>.blob.core.windows.net/<container_name>/<path>` <br> `abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>`|
45
+
|A path on a Datastore |`azureml://datastores/<data_store_name>/paths/<path>`|
46
+
|A path to a Data Asset |`azureml:<my_data>:<version>`|
47
47
48
48
## Supported modes
49
49
50
-
When you run a job with data inputs/outputs, you can specify the *mode* - for example, whether you would like the data to be read-only mounted or downloaded to the compute target. The table below shows the possible modes for different type/mode/input/output combinations:
50
+
When you run a job with data inputs/outputs, you can specify the *mode* - for example, whether the data should be read-only mounted, or downloaded to the compute target. This table shows the possible modes for different type/mode/input/output combinations:
> `eval_download` and `eval_mount` are unique to `mltable`. Whilst`ro_mount` is the default mode for MLTable, there are scenarios where an MLTable can yield files that are not necessarily co-located with the MLTable file in storage. Alternatively, an `mltable` can subset or shuffle the data that resides in the storage. That view is only visible if the MLTable file is actually evaluated by the engine. These modes will provide that view of the files.
62
+
> `eval_download` and `eval_mount` are unique to `mltable`. The`ro_mount` is the default mode for MLTable. In some scenarios, however, an MLTable can yield files that are not necessarily co-located with the MLTable file in storage. Alternately, an `mltable` can subset or shuffle the data located in the storage resource. That view becomes visible only if the engine actually evaluates the MLTable file. These modes provide that view of the files.
63
63
64
64
65
65
## Read data in a job
66
66
67
67
# [Azure CLI](#tab/cli)
68
68
69
-
Create a job specification YAML file (`<file-name>.yml`). Specify in the `inputs` section of the job:
69
+
Create a job specification YAML file (`<file-name>.yml`). In the `inputs` section of the job, specify:
70
70
71
-
1. The `type`; whether the data is a specific file (`uri_file`) or a folder location (`uri_folder`) or an `mltable`.
72
-
1. The `path` of where your data is located; can be any of the paths outlined in the [Supported Paths](#supported-paths) section.
71
+
1. The `type`; whether the data is a specific file (`uri_file`), a folder location (`uri_folder`), or an `mltable`.
72
+
1. The `path` of your data location; any of the paths outlined in the [Supported Paths](#supported-paths) section will work.
In your job you can write data to your cloud-based storage using*outputs*. The [Supported modes](#supported-modes) section showed that only job *outputs* can write data because the mode can be either `rw_mount` or `upload`.
282
+
In your job, you can write data to your cloud-based storage with*outputs*. The [Supported modes](#supported-modes) section showed that only job *outputs* can write data, because the mode can be either `rw_mount` or `upload`.
284
283
285
284
# [Azure CLI](#tab/cli)
286
285
287
-
Create a job specification YAML file (`<file-name>.yml`), with the `outputs` section populated with the type and path of where you would like to write your data to:
286
+
Create a job specification YAML file (`<file-name>.yml`), with the `outputs` section populated with the type and path where you'd like to write your data:
If you're working with Azure Machine Learning pipelines, you can read data into and move data between pipeline components with the Azure Machine Learning CLI v2 extension or the Python SDK v2.
365
+
If you work with Azure Machine Learning pipelines, you can read data into and move data between pipeline components with the Azure Machine Learning CLI v2 extension, or the Python SDK v2.
367
366
368
367
### Azure Machine Learning CLI v2
369
-
The following YAML file demonstrates how to use the output data from one component as the input for another component of the pipeline using the Azure Machine Learning CLI v2 extension:
368
+
This YAML file shows how to use the output data from one component as the input for another component of the pipeline, with the Azure Machine Learning CLI v2 extension:
0 commit comments