You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Azure Machine Learning supports a Table type (`mltable`). This allows for the creation of a *blueprint* that defines how to load data files into memory as a Pandas or Spark data frame. In this article you learn:
25
25
26
26
> [!div class="checklist"]
27
-
> - When to use Tables instead of Files or Folders.
28
-
> - How to install the MLTable SDK.
29
-
> - How to define a data loading blueprint using an `MLTable` file.
30
-
> - Examples that show use of Tables in Azure ML.
31
-
> - How to use Tables during interactive development (for example, in a notebook).
27
+
> - When to use Azure ML Tables instead of Files or Folders.
28
+
> - How to install the `mltable` SDK.
29
+
> - How to define a data loading blueprint using an `mltable` file.
30
+
> - Examples that show how `mltable` is used in Azure ML.
31
+
> - How to use the `mltable` during interactive development (for example, in a notebook).
> Use `--depth 1` to clone only the latest commit to the repository. This reduces the time needed to complete the operation.
57
57
58
-
The examples relevant to Azure Machine Learning Tables can be found in the following folder of the clone repo:
58
+
The examples relevant to Azure Machine Learning Tables can be found in the following folder of the cloned repo:
59
59
60
60
```bash
61
61
cd azureml-examples/sdk/python/using-mltable
@@ -65,7 +65,7 @@ cd azureml-examples/sdk/python/using-mltable
65
65
66
66
Azure Machine Learning Tables (`mltable`) allow you to define how you want to *load* your data files into memory, as a Pandas and/or Spark data frame. Tables have two key features:
67
67
68
-
1.**An `MLTable` file.** A YAML-based file that defines the data loading *blueprint*. In the MLTable file, you can specify:
68
+
1.**An MLTable file.** A YAML-based file that defines the data loading *blueprint*. In the MLTable file, you can specify:
69
69
- The storage location(s) of the data - local, in the cloud, or on a public http(s) server.
70
70
-*Globbing* patterns over cloud storage. These locations can specify sets of filenames, with wildcard characters (`*`).
71
71
-*read transformation* - for example, the file format type (delimited text, Parquet, Delta, json), delimiters, headers, etc.
@@ -86,9 +86,9 @@ Azure Machine Learning Tables are useful in the following scenarios:
86
86
- You want to train ML models using Azure Machine Learning AutoML.
87
87
88
88
> [!TIP]
89
-
> Azure ML *doesn't* require use of Azure ML Tables (`mltable`) for your tabular data. You can use Azure ML File (`uri_file`) and Folder (`uri_folder`) types, and your own parsing logic loads the data into a Pandas or Spark data frame.
89
+
> Azure ML *doesn't require* use of Azure ML Tables (`mltable`) for your tabular data. You can use Azure ML File (`uri_file`) and Folder (`uri_folder`) types, and your own parsing logic loads the data into a Pandas or Spark data frame.
90
90
>
91
-
> If you have a simple CSV file or Parquet folder, it's **easier** to use Azure ML Files/Folders instead of than Tables.
91
+
> If you have a simple CSV file or Parquet folder, it's **easier** to use Azure ML Files/Folders instead of Tables.
92
92
93
93
## Azure Machine Learning Tables Quickstart
94
94
@@ -126,8 +126,8 @@ With this data, you want to load into a Pandas data frame:
126
126
127
127
Pandas code handles this. However, achieving *reproducibility* would become difficult because you must either:
128
128
129
-
-share code, which means that if the schema changes (for example, a column name change) then all users must update their code, or
130
-
-write an ETL pipeline, which has heavy overhead.
129
+
-Share code, which means that if the schema changes (for example, a column name change) then all users must update their code, or
130
+
-Write an ETL pipeline, which has heavy overhead.
131
131
132
132
Azure Machine Learning Tables provide a light-weight mechanism to serialize (save) the data loading steps in an `MLTable` file, so that you and members of your team can *reproduce* the Pandas data frame. If the schema changes, you only update the `MLTable` file, instead of updates in many places that involve Python data loading code.
The `mltable` flexibility allows for data loading into a single dataframe from a combination of:
510
510
511
-
-local and cloud storage
512
-
-different cloud storage locations (for example: different blob containers)
513
-
-files, folder and glob patterns.
511
+
-Local and cloud storage
512
+
-Different cloud storage locations (for example: different blob containers)
513
+
-Files, folder and glob patterns.
514
514
515
515
For example:
516
516
@@ -699,7 +699,7 @@ You can create a table containing the paths on cloud storage. This example has s
699
699
1.jpeg
700
700
```
701
701
702
-
MLTable can construct a table that contains the storage paths of these images and their folder names (labels), which can be used to stream the images. The following code shows how to create the MLTable:
702
+
The `mltable` can construct a table that contains the storage paths of these images and their folder names (labels), which can be used to stream the images. The following code shows how to create the MLTable:
703
703
704
704
```python
705
705
import mltable
@@ -722,7 +722,7 @@ print(df.head())
722
722
tbl.save("./pets")
723
723
```
724
724
725
-
The following code shows how to open the storage location in the Pandas data frame, and plot the images in a grid:
725
+
The following code shows how to open the storage location in the Pandas data frame, and plot the images:
726
726
727
727
```python
728
728
# plot images on a grid. Note this takes ~1min to execute.
@@ -742,7 +742,7 @@ for i in range(1, columns*rows +1):
742
742
743
743
#### Create a data asset to aid sharing and reproducibility
744
744
745
-
You have your MLTable file currently saved on disk, which makes it hard to share with Team members. When you create a data asset in Azure Machine Learning, your MLTable is uploaded to cloud storage and "bookmarked", which allows your Team members to access the MLTable using a friendly name. Also, the data asset is versioned.
745
+
You have your `mltable` file currently saved on disk, which makes it hard to share with Team members. When you create a data asset in Azure Machine Learning, the `mltable` is uploaded to cloud storage and "bookmarked", which allows your Team members to access the `mltable` using a friendly name. Also, the data asset is versioned.
746
746
747
747
```python
748
748
import time
@@ -771,7 +771,7 @@ my_data = Data(
771
771
ml_client.data.create_or_update(my_data)
772
772
```
773
773
774
-
Now that you have your MLTable stored in the cloud, you and Team members can access it with a friendly name in an interactive session (for example, a notebook):
774
+
Now that the `mltable` is stored in the cloud, you and your Team members can access it with a friendly name in an interactive session (for example, a notebook):
0 commit comments