Skip to content

Commit 1fa3257

Browse files
authored
Merge pull request #88168 from nibaccam/dataset-ui
Data | create dataset via UI
2 parents 1a90895 + 4007583 commit 1fa3257

File tree

2 files changed

+20
-1
lines changed

2 files changed

+20
-1
lines changed

articles/machine-learning/service/how-to-create-register-datasets.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,13 @@ workspace = Workspace.from_config()
7272
# retrieve an existing datastore in the workspace by name
7373
datastore = Datastore.get(workspace, datastore_name)
7474
```
75+
7576
### Create TabularDatasets
7677

78+
TabularDatasets can be created via the SDK or by using the workspace landing page (preview).
79+
80+
#### SDK
81+
7782
Use the `from_delimited_files()` method on `TabularDatasetFactory` class to read files in csv or tsv format, and create an unregistered TabularDataset. If you are reading from multiple files, results will be aggregated into one tabular representation.
7883

7984
```Python
@@ -99,7 +104,18 @@ titanic_ds.take(3).to_pandas_dataframe()
99104
1|2|1|1|Cumings, Mrs. John Bradley (Florence Briggs Th...|female|38.0|1|0|PC 17599|71.2833|C85|C
100105
2|3|1|3|Heikkinen, Miss. Laina|female|26.0|0|0|STON/O2. 3101282|7.9250||S
101106

107+
#### Workspace landing page
108+
109+
Sign in to the [workspace landing page](https://ml.azure.com) to create a dataset via the web experience. Currently, the workspace landing page only supports the creation of TabularDatasets.
110+
111+
The following animation shows how to create a dataset in the workspace landing page.
112+
113+
First, select **Datasets** in the **Assets** section of the left pane. Then, select **+ Create Dataset** to choose the source of your dataset; this can either be from local files, datastore or public web urls. The **Settings and preview** and the **Schema** forms are intelligently populated based on file type. Select **Next** to review them or to further configure your dataset prior to creation. Select **Done** to complete your dataset creation.
114+
115+
![Create a dataset with the UI](media/how-to-create-register-datasets/create-dataset-ui.gif)
116+
102117
### Create FileDatasets
118+
103119
Use the `from_files()` method on `FileDatasetFactory` class to load files in any format, and create an unregistered FileDataset.
104120

105121
```Python
@@ -130,14 +146,17 @@ titanic_ds = titanic_ds.register(workspace = workspace,
130146
description = 'titanic training data')
131147
```
132148

149+
>[!Note]
150+
> Datasets created via the workspace landing page are automatically registered to the workspace.
151+
133152
## Version datasets
134153

135154
You can register a new dataset under the same name by creating a new version. Dataset version is a way to bookmark the state of your data, so you can apply a specific version of the dataset for experimentation or future reproduction. Typical scenarios to consider versioning:
136155
* When new data is available for retraining.
137156
* When you are applying different data preparation or feature engineering approaches.
138157

139158
```Python
140-
# create a TabularDataset from new Titanic training data
159+
# create a TabularDataset from Titanic training data
141160
web_paths = [
142161
'https://dprepdata.blob.core.windows.net/demo/Titanic.csv',
143162
'https://dprepdata.blob.core.windows.net/demo/Titanic2.csv'
1.12 MB
Loading

0 commit comments

Comments
 (0)