Skip to content

Commit 9bfee9b

Browse files
authored
Merge pull request #89308 from MayMSFT/patch-19
Update how-to-create-register-datasets.md
2 parents 057ad5b + 7c2438c commit 9bfee9b

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

articles/machine-learning/service/how-to-create-register-datasets.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ To create and work with datasets, you need:
4343

4444
Datasets are categorized into two types based on how users consume them in training.
4545

46-
* [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas DataFrame. A `TabularDataset` object can be created from csv, tsv, parquet files, SQL query results etc. For a complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference).
46+
* [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas or spark DataFrame. A `TabularDataset` object can be created from csv, tsv, parquet files, SQL query results etc. For a complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference).
4747

4848
* [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public urls. This provides you with the ability to download or mount the files to your compute. The files can be of any format, which enables a wider range of machine learning scenarios including deep learning.
4949

@@ -203,7 +203,7 @@ titanic_ds = titanic_ds.register(workspace = workspace,
203203
```
204204

205205

206-
## Access your data during training
206+
## Access datasets in your script
207207

208208
Registered datasets are accessible locally and remotely on compute clusters like the Azure Machine Learning compute. To access your registered Dataset across experiments, use the following code to get your workspace and registered dataset by name. The [`get_by_name()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-by-name-workspace--name--version--latest--) method on the `Dataset` class by default returns the latest version of the dataset registered with the workspace.
209209

@@ -226,5 +226,6 @@ df = titanic_ds.to_pandas_dataframe()
226226

227227
## Next steps
228228

229+
* Learn [how to train with datasets](how-to-train-with-datasets.md)
229230
* Use automated machine learning to [train with TabularDatasets](https://aka.ms/automl-dataset).
230231
* For more examples of training with datasets, see the [sample notebooks](https://aka.ms/dataset-tutorial).

0 commit comments

Comments
 (0)