You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-create-data-assets.md
+39-6Lines changed: 39 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ In this article, you learn how to create a data asset in Azure Machine Learning.
27
27
28
28
The benefits of creating data assets are:
29
29
30
-
* You can **share and reuse data** with other members of the team such that they do not need to remember file locations.
30
+
* You can **share and reuse data** with other members of the team such that they don't need to remember file locations.
31
31
32
32
* You can **seamlessly access data** during model training (on any supported compute type) without worrying about connection strings or data paths.
33
33
@@ -63,13 +63,13 @@ When you create a data asset in Azure Machine Learning, you'll need to specify a
63
63
64
64
65
65
## Data asset types
66
-
-[**URIs**](#Create a `uri_folder` data asset) - A **U**niform **R**esource **I**dentifier that is a reference to a storage location on your local computer or in the cloud that makes it very easy to access data in your jobs. Azure Machine Learning distinguishes two types of URIs:`uri_file` and `uri_folder`.
66
+
-[**URIs**](#Create a `uri_folder` data asset) - A **U**niform **R**esource **I**dentifier that is a reference to a storage location on your local computer or in the cloud that makes it easy to access data in your jobs. Azure Machine Learning distinguishes two types of URIs:`uri_file` and `uri_folder`.
67
67
68
-
-[**MLTable**](#Create a `mltable` data asset) - `MLTable` helps you to abstract the schema definition for tabular data so it is more suitable for complex/changing schema or to be leveraged in automl. If you just want to create an data asset for a job or you want to write your own parsing logic in python you could use `uri_file`, `uri_folder`.
68
+
-[**MLTable**](#Create a `mltable` data asset) - `MLTable` helps you to abstract the schema definition for tabular data so it is more suitable for complex/changing schema or to be used in AutoML. If you just want to create a data asset for a job or you want to write your own parsing logic in python you could use `uri_file`, `uri_folder`.
69
69
70
70
The ideal scenarios to use `mltable` are:
71
71
- The schema of your data is complex and/or changes frequently.
72
-
- You only need a subset of data (for example: a sample of rows or files, specific columns, etc).
72
+
- You only need a subset of data (for example: a sample of rows or files, specific columns, etc.)
73
73
- AutoML jobs requiring tabular data.
74
74
75
75
If your scenario does not fit the above then it is likely that URIs are a more suitable type.
@@ -223,7 +223,7 @@ To create a File data asset in the Azure Machine Learning studio, use the follow
223
223
- JSON Lines
224
224
- Delta Lake
225
225
226
-
Please find more details about what are the abilities we provide via `mltable` in [reference-yaml-mltable](reference-yaml-mltable.md).
226
+
Find more details about what are the abilities we provide via `mltable` in [reference-yaml-mltable](reference-yaml-mltable.md).
227
227
228
228
In this section, we show you how to create a data asset when the type is an `mltable`.
229
229
@@ -234,7 +234,7 @@ The MLTable file is a file that provides the specification of the data's schema
234
234
> [!NOTE]
235
235
> This file needs to be named exactly as `MLTable`.
236
236
237
-
An *example* MLTable file is provided below:
237
+
An *example* MLTable file for delimited files is provided below:
238
238
239
239
```yml
240
240
type: mltable
@@ -247,6 +247,24 @@ transformations:
247
247
encoding: ascii
248
248
header: all_files_same_headers
249
249
```
250
+
251
+
An *example* MLTable file for Delta Lake is provided below:
252
+
```yml
253
+
type: mltable
254
+
255
+
paths:
256
+
- abfss://my_delta_files
257
+
258
+
transformations:
259
+
- read_delta_lake:
260
+
timestamp_as_of: '2022-08-26T00:00:00Z'
261
+
#timestamp_as_of: Timestamp to be specified for time-travel on the specific Delta Lake data.
262
+
#version_as_of: Version to be specified for time-travel on the specific Delta Lake data.
263
+
```
264
+
265
+
For more transformations available in `mltable`, please look into [reference-yaml-mltable](reference-yaml-mltable.md).
266
+
267
+
250
268
> [!IMPORTANT]
251
269
> We recommend co-locating the MLTable file with the underlying data in storage. For example:
252
270
>
@@ -261,6 +279,21 @@ transformations:
261
279
> ```
262
280
> Co-locating the MLTable with the data ensures a **self-contained *artifact*** where all that is needed is stored in that one folder (`my_data`); regardless of whether that folder is stored on your local drive or in your cloud store or on a public http server. You should **not** specify *absolute paths* in the MLTable file.
263
281
282
+
283
+
### Create an MLTable artifact via Python SDK: from_*
284
+
If you would like to create an MLTable object in memory via Python SDK, you could use from_* methods.
285
+
The from_* methods does not materialize the data, but rather stores is as a transformation in the MLTable definition.
286
+
287
+
For example you can use from_delta_lake() to create an in-memory MLTable artifact to read delta lake data from the path `delta_table_path`.
0 commit comments