Skip to content

Commit 7c2438c

Browse files
authored
Update how-to-create-register-datasets.md
update links
1 parent 499a9e3 commit 7c2438c

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

articles/machine-learning/service/how-to-create-register-datasets.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ To create and work with datasets, you need:
4343

4444
Datasets are categorized into two types based on how users consume them in training.
4545

46-
* [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas DataFrame. A `TabularDataset` object can be created from csv, tsv, parquet files, SQL query results etc. For a complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference).
46+
* [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas or spark DataFrame. A `TabularDataset` object can be created from csv, tsv, parquet files, SQL query results etc. For a complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference).
4747

4848
* [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public urls. This provides you with the ability to download or mount the files to your compute. The files can be of any format, which enables a wider range of machine learning scenarios including deep learning.
4949

@@ -193,7 +193,7 @@ titanic_ds = titanic_ds.register(workspace = workspace,
193193
```
194194

195195

196-
## Access your data during training
196+
## Access datasets in your script
197197

198198
Registered datasets are accessible locally and remotely on compute clusters like the Azure Machine Learning compute. To access your registered Dataset across experiments, use the following code to get your workspace and registered dataset by name. The [`get_by_name()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-by-name-workspace--name--version--latest--) method on the `Dataset` class by default returns the latest version of the dataset registered with the workspace.
199199

@@ -216,5 +216,6 @@ df = titanic_ds.to_pandas_dataframe()
216216

217217
## Next steps
218218

219+
* Learn [how to train with datasets](how-to-train-with-datasets.md)
219220
* Use automated machine learning to [train with TabularDatasets](https://aka.ms/automl-dataset).
220221
* For more examples of training with datasets, see the [sample notebooks](https://aka.ms/dataset-tutorial).

0 commit comments

Comments
 (0)