Skip to content

Commit d1a8822

Browse files
authored
Update reference-yaml-mltable.md
included not on primitive types for reading parquet files
1 parent 4822612 commit d1a8822

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

articles/machine-learning/reference-yaml-mltable.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ This article contains information relating to the `MLTable` YAML schema only. Fo
4242
|Read Transformation | Description | Parameters |
4343
|---------|---------|---------|
4444
|`read_delimited` | Adds a transformation step to read delimited text file(s) provided in `paths`. | `infer_column_types`: Boolean to infer column data types. Defaults to True. Type inference requires that the current compute can access the data source. Currently, type inference will only pull the first 200 rows.<br><br>`encoding`: Specify the file encoding. Supported encodings: `utf8`, `iso88591`, `latin1`, `ascii`, `utf16`, `utf32`, `utf8bom` and `windows1252`. Default encoding: `utf8`.<br><br>`header`: user can choose one of the following options: `no_header`, `from_first_file`, `all_files_different_headers`, `all_files_same_headers`. Defaults to `all_files_same_headers`.<br><br>`delimiter`: The separator used to split columns.<br><br>`empty_as_string`: Specify if empty field values should load as empty strings. The default (False) will read empty field values as nulls. Passing this setting as *True* will read empty field values as empty strings. If the values are converted to numeric or datetime, then this setting has no effect, as empty values will be converted to nulls.<br><Br>`include_path_column`: Boolean to keep path information as column in the table. Defaults to False. This setting is useful when reading multiple files, and you want to know from which file a specific record originated. Additionally, you can keep useful information in the file path.<br><br>`support_multi_line`: By default (`support_multi_line=False`), all line breaks, including line breaks in quoted field values, will be interpreted as a record break. This approach to data reading increases speed, and it offers optimization for parallel execution on multiple CPU cores. However, it may result in silent production of more records with misaligned field values. Set this value to True when the delimited files are known to contain quoted line breaks. |
45-
| `read_parquet` | Adds a transformation step to read Parquet formatted file(s) provided in `paths`. | `include_path_column`: Boolean to keep path information as a table column. Defaults to False. This setting helps when you read multiple files, and you want to know from which file a specific record originated. Additionally, you can keep useful information in the file path. |
45+
| `read_parquet` | Adds a transformation step to read Parquet formatted file(s) provided in `paths`. | `include_path_column`: Boolean to keep path information as a table column. Defaults to False. This setting helps when you read multiple files, and you want to know from which file a specific record originated. Additionally, you can keep useful information in the file path.<br><br>**NOTE:** MLTable only supports reading parquet files that have columns consisting of primitive types. Columns containing arrays are **not** supported. |
4646
| `read_delta_lake` | Adds a transformation step to read a Delta Lake folder provided in `paths`. You can read the data at a particular timestamp or version. | `timestamp_as_of`: String. Timestamp to be specified for time-travel on the specific Delta Lake data. To read data at a specific point in time, the datetime string should have a [RFC-3339/ISO-8601 format](https://wikipedia.org/wiki/ISO_8601). (for example: "2022-10-01T00:00:00Z", "2022-10-01T00:00:00+08:00", "2022-10-01T01:30:00-08:00")<br><br>`version_as_of`: Integer. Version to be specified for time-travel on the specific Delta Lake data.<br><br>**One value of `timestamp_as_of` or `version_as_of` must be provided.**
4747
| `read_json_lines` | Adds a transformation step to read the json file(s) provided in `paths`. | `include_path_column`: Boolean to keep path information as column in the MLTable. Defaults to False. This setting becomes useful to read multiple files, and you want to know from which file a particular record originated. Additionally, you can keep useful information in file path.<br><br>`invalid_lines`: How to handle lines that have invalid JSON. Supported values: `error` and `drop`. Defaults to `error`.<br><br>`encoding`: Specify the file encoding. Supported encodings are `utf8`, `iso88591`, `latin1`, `ascii`, `utf16`, `utf32`, `utf8bom` and `windows1252`. Default is `utf8`.
4848

@@ -245,4 +245,4 @@ transformations:
245245
## Next steps
246246

247247
- [Install and use the CLI (v2)](how-to-configure-cli.md)
248-
- [Working with tables in Azure Machine Learning](how-to-mltable.md)
248+
- [Working with tables in Azure Machine Learning](how-to-mltable.md)

0 commit comments

Comments
 (0)