You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TabularDatasets can be created via the SDK or by using the workspace landing page (preview).
79
+
80
+
#### SDK
81
+
77
82
Use the `from_delimited_files()` method on `TabularDatasetFactory` class to read files in csv or tsv format, and create an unregistered TabularDataset. If you are reading from multiple files, results will be aggregated into one tabular representation.
Sign in to the [workspace landing page](https://ml.azure.com) to create a dataset via the web experience. Currently, the workspace landing page only supports the creation of TabularDatasets.
110
+
111
+
The following animation shows how to create a dataset in the workspace landing page.
112
+
113
+
First, select **Datasets** in the **Assets** section of the left pane. Then, select **+ Create Dataset** to choose the source of your dataset; this can either be from local files, datastore or public web urls. The **Settings and preview** and the **Schema** forms are intelligently populated based on file type. Select **Next** to review them or to further configure your dataset prior to creation. Select **Done** to complete your dataset creation.
114
+
115
+

116
+
102
117
### Create FileDatasets
118
+
103
119
Use the `from_files()` method on `FileDatasetFactory` class to load files in any format, and create an unregistered FileDataset.
> Datasets created via the workspace landing page are automatically registered to the workspace.
151
+
133
152
## Version datasets
134
153
135
154
You can register a new dataset under the same name by creating a new version. Dataset version is a way to bookmark the state of your data, so you can apply a specific version of the dataset for experimentation or future reproduction. Typical scenarios to consider versioning:
136
155
* When new data is available for retraining.
137
156
* When you are applying different data preparation or feature engineering approaches.
138
157
139
158
```Python
140
-
# create a TabularDataset from new Titanic training data
159
+
# create a TabularDataset from Titanic training data
0 commit comments