Skip to content

Commit 7839756

Browse files
Update how-to-create-register-datasets.md
added timeseries trait
1 parent ad3522b commit 7839756

File tree

1 file changed

+21
-1
lines changed

1 file changed

+21
-1
lines changed

articles/machine-learning/service/how-to-create-register-datasets.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ To create and work with datasets, you need:
4242
## Dataset Types
4343

4444
Datasets are categorized into various types based on how users consume them in training. List of Dataset types:
45-
* [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas DataFrame. A `TabularDataset` object can be created from csv, tsv, parquet files, SQL query results etc. For a complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference).
45+
* [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas DataFrame. A `TabularDataset` object can be created from csv, tsv, parquet files, SQL query results etc. For a complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference). A timestamp can be specified from a column in the data or the path pattern data is stored in to enable a timeseries trait, which allows for easy and efficient filtering by time.
4646
* [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public urls. This provides you with the ability to download or mount the files to your compute. The files can be of any format, which enables a wider range of machine learning scenarios including deep learning.
4747

4848
To find out more about upcoming API changes, see [here](https://aka.ms/tabular-dataset).
@@ -104,6 +104,26 @@ titanic_ds.take(3).to_pandas_dataframe()
104104
1|2|1|1|Cumings, Mrs. John Bradley (Florence Briggs Th...|female|38.0|1|0|PC 17599|71.2833|C85|C
105105
2|3|1|3|Heikkinen, Miss. Laina|female|26.0|0|0|STON/O2. 3101282|7.9250||S
106106

107+
108+
Use the `with_timestamp_columns()` method on `TabularDataset` class to enable easy and efficient filtering by time. More examples and details can be found [here](http://aka.ms/azureml-tsd-notebook).
109+
110+
```Python
111+
# create a TabularDataset with timeseries trait
112+
datastore_paths = [(datastore, 'weather/*/*/*/data.parquet')]
113+
114+
# get a coarse timestamp column from the path pattern
115+
dataset = Dataset.Tabular.from_parquet_files(path=datastore_path, partition_format='weather/{coarse_time:yyy/MM/dd}/data.parquet')
116+
117+
# set coarse timestamp to the virtual column created, and fine grain timestamp from a column in the data
118+
dataset = dataset.with_timestamp_columns(fine_grain_timestamp='datetime', coarse_grain_timestamp='coarse_time')
119+
120+
# filter with timeseries trait specific methods
121+
data_slice = dataset.time_before(datetime(2019, 1, 1))
122+
data_slice = dataset.time_after(datetime(2019, 1, 1))
123+
data_slice = dataset.time_between(datetime(2019, 1, 1), datetime(2019, 2, 1))
124+
data_slice = dataset.time_recent(timedelta(weeks=1, days=1))
125+
```
126+
107127
#### Workspace landing page
108128

109129
Sign in to the [workspace landing page](https://ml.azure.com) to create a dataset via the web experience. Currently, the workspace landing page only supports the creation of TabularDatasets.

0 commit comments

Comments
 (0)