Skip to content

Commit 8020dcf

Browse files
committed
Train rework
1 parent 6de313a commit 8020dcf

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

articles/machine-learning/how-to-train-with-datasets.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ms.date: 04/20/2020
1919
# Train with datasets in Azure Machine Learning
2020
[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-basic-enterprise-sku.md)]
2121

22-
In this article, you learn how to consume [Azure Machine Learning datasets](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset%28class%29?view=azure-ml-py) in your training experiments. Use them in your local or remote compute target without worrying about connection strings or data paths.
22+
In this article, you learn how to consume [Azure Machine Learning datasets](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset%28class%29?view=azure-ml-py) in your training experiments. You can use datasets in your local or remote compute target without worrying about connection strings or data paths.
2323

2424
Azure Machine Learning datasets provide a seamless integration with Azure Machine Learning training products like [ScriptRun](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrun?view=azure-ml-py), [Estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator?view=azure-ml-py), [HyperDrive](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py) and [Azure Machine Learning pipelines](how-to-create-your-first-pipeline.md).
2525

@@ -44,7 +44,7 @@ In this example, you create a [TabularDataset](https://docs.microsoft.com/python
4444

4545
### Create a TabularDataset
4646

47-
TabularDataset objects provide the ability to load the data into a pandas or spark DataFrame so that you can work with familiar data preparation and training libraries without having to leave your notebook. To leverage this capability, see [how to access input datasets](#access-input-datasets).
47+
4848

4949
The following code creates an unregistered TabularDataset from a web url. You can also create datasets from local files or paths in datastores. Learn more about [how to create datasets](https://aka.ms/azureml/howto/createdatasets).
5050

@@ -54,6 +54,7 @@ from azureml.core.dataset import Dataset
5454
web_path ='https://dprepdata.blob.core.windows.net/demo/Titanic.csv'
5555
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_path)
5656
```
57+
TabularDataset objects provide the ability to load the data into a pandas or spark DataFrame so that you can work with familiar data preparation and training libraries without having to leave your notebook. To leverage this capability, see [how to access input datasets](#access-input-datasets).
5758

5859
### Configure the estimator
5960

@@ -82,10 +83,9 @@ experiment_run.wait_for_completion(show_output=True)
8283

8384
### Access input dataset
8485

85-
If you want to get the dataset used in your training run
86+
You can access and explore existing datasets across experiments within your workspace.
8687

87-
88-
The following code uses the [`get_context()`]() method in the [`Run`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py) class to access the input TabularDataset, `titanic`, in the training script. Then uses the [`to_pandas_dataframe()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method to load that dataset into a pandas dataframe.
88+
The following code uses the [`get_context()`]() method in the [`Run`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py) class to access the input TabularDataset, `titanic`, in the training script. Then uses the [`to_pandas_dataframe()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method to load that dataset into a pandas dataframe for further data exploration and preparation.
8989

9090
```Python
9191
%%writefile $script_folder/train_titanic.py
@@ -101,7 +101,7 @@ df = dataset.to_pandas_dataframe()
101101
```
102102
## Mount files to remote compute targets
103103

104-
If you have unstructured data, create a [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) and either mount or download your data files to make them available to your remote compute target for training. Learn about when to use [mount vs. download](#mount-vs.-download) for your training experiments.
104+
If you have unstructured data, create a [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) and either mount or download your data files to make them available to your remote compute target for training. Learn about when to use [mount vs. download](#mount-vs.-download) for your remote training experiments.
105105

106106
The following example creates a FileDataset and mounts the dataset to the compute target by passing it as an argument in the estimator for training.
107107

@@ -146,8 +146,8 @@ est = SKLearn(source_directory=script_folder,
146146
run = experiment.submit(est)
147147
run.wait_for_completion(show_output=True)
148148
```
149+
149150
### Retrieve the data in your training script
150-
If .............................
151151

152152
The following code shows how to retrieve the data in your script.
153153

0 commit comments

Comments
 (0)