Skip to content

Commit 3aea1d0

Browse files
Merge pull request #217521 from SturgeonMi/patch-17
Update how-to-create-data-assets.md
2 parents 7e4adc0 + 206a2c5 commit 3aea1d0

File tree

1 file changed

+39
-6
lines changed

1 file changed

+39
-6
lines changed

articles/machine-learning/how-to-create-data-assets.md

Lines changed: 39 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ In this article, you learn how to create a data asset in Azure Machine Learning.
2727
2828
The benefits of creating data assets are:
2929

30-
* You can **share and reuse data** with other members of the team such that they do not need to remember file locations.
30+
* You can **share and reuse data** with other members of the team such that they don't need to remember file locations.
3131

3232
* You can **seamlessly access data** during model training (on any supported compute type) without worrying about connection strings or data paths.
3333

@@ -63,13 +63,13 @@ When you create a data asset in Azure Machine Learning, you'll need to specify a
6363
6464

6565
## Data asset types
66-
- [**URIs**](#Create a `uri_folder` data asset) - A **U**niform **R**esource **I**dentifier that is a reference to a storage location on your local computer or in the cloud that makes it very easy to access data in your jobs. Azure Machine Learning distinguishes two types of URIs:`uri_file` and `uri_folder`.
66+
- [**URIs**](#Create a `uri_folder` data asset) - A **U**niform **R**esource **I**dentifier that is a reference to a storage location on your local computer or in the cloud that makes it easy to access data in your jobs. Azure Machine Learning distinguishes two types of URIs:`uri_file` and `uri_folder`.
6767

68-
- [**MLTable**](#Create a `mltable` data asset) - `MLTable` helps you to abstract the schema definition for tabular data so it is more suitable for complex/changing schema or to be leveraged in automl. If you just want to create an data asset for a job or you want to write your own parsing logic in python you could use `uri_file`, `uri_folder`.
68+
- [**MLTable**](#Create a `mltable` data asset) - `MLTable` helps you to abstract the schema definition for tabular data so it is more suitable for complex/changing schema or to be used in AutoML. If you just want to create a data asset for a job or you want to write your own parsing logic in python you could use `uri_file`, `uri_folder`.
6969

7070
The ideal scenarios to use `mltable` are:
7171
- The schema of your data is complex and/or changes frequently.
72-
- You only need a subset of data (for example: a sample of rows or files, specific columns, etc).
72+
- You only need a subset of data (for example: a sample of rows or files, specific columns, etc.)
7373
- AutoML jobs requiring tabular data.
7474

7575
If your scenario does not fit the above then it is likely that URIs are a more suitable type.
@@ -223,7 +223,7 @@ To create a File data asset in the Azure Machine Learning studio, use the follow
223223
- JSON Lines
224224
- Delta Lake
225225

226-
Please find more details about what are the abilities we provide via `mltable` in [reference-yaml-mltable](reference-yaml-mltable.md).
226+
Find more details about what are the abilities we provide via `mltable` in [reference-yaml-mltable](reference-yaml-mltable.md).
227227

228228
In this section, we show you how to create a data asset when the type is an `mltable`.
229229

@@ -234,7 +234,7 @@ The MLTable file is a file that provides the specification of the data's schema
234234
> [!NOTE]
235235
> This file needs to be named exactly as `MLTable`.
236236
237-
An *example* MLTable file is provided below:
237+
An *example* MLTable file for delimited files is provided below:
238238

239239
```yml
240240
type: mltable
@@ -247,6 +247,24 @@ transformations:
247247
encoding: ascii
248248
header: all_files_same_headers
249249
```
250+
251+
An *example* MLTable file for Delta Lake is provided below:
252+
```yml
253+
type: mltable
254+
255+
paths:
256+
- abfss://my_delta_files
257+
258+
transformations:
259+
- read_delta_lake:
260+
timestamp_as_of: '2022-08-26T00:00:00Z'
261+
#timestamp_as_of: Timestamp to be specified for time-travel on the specific Delta Lake data.
262+
#version_as_of: Version to be specified for time-travel on the specific Delta Lake data.
263+
```
264+
265+
For more transformations available in `mltable`, please look into [reference-yaml-mltable](reference-yaml-mltable.md).
266+
267+
250268
> [!IMPORTANT]
251269
> We recommend co-locating the MLTable file with the underlying data in storage. For example:
252270
>
@@ -261,6 +279,21 @@ transformations:
261279
> ```
262280
> Co-locating the MLTable with the data ensures a **self-contained *artifact*** where all that is needed is stored in that one folder (`my_data`); regardless of whether that folder is stored on your local drive or in your cloud store or on a public http server. You should **not** specify *absolute paths* in the MLTable file.
263281
282+
283+
### Create an MLTable artifact via Python SDK: from_*
284+
If you would like to create an MLTable object in memory via Python SDK, you could use from_* methods.
285+
The from_* methods does not materialize the data, but rather stores is as a transformation in the MLTable definition.
286+
287+
For example you can use from_delta_lake() to create an in-memory MLTable artifact to read delta lake data from the path `delta_table_path`.
288+
```python
289+
import mltable as mlt
290+
mltable = from_delta_lake(delta_table_path, timestamp_as_of="2021-01-01T00:00:00Z")
291+
df = mltable.to_pandas_dataframe()
292+
print(df.to_string())
293+
```
294+
Please find more details about [MLTable Python functions here](/python/api/mltable/mltable).
295+
296+
264297
In your Python code, you materialize the MLTable artifact into a Pandas dataframe using:
265298

266299
```python

0 commit comments

Comments
 (0)