You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this article, you learn how to consume[Azure Machine Learning datasets](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset%28class%29?view=azure-ml-py) in your training experiments. You can use datasets in your local or remote compute target without worrying about connection strings or data paths.
22
+
In this article, you learn how to work with[Azure Machine Learning datasets](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset%28class%29?view=azure-ml-py) in your training experiments. You can use datasets in your local or remote compute target without worrying about connection strings or data paths.
23
23
24
24
Azure Machine Learning datasets provide a seamless integration with Azure Machine Learning training products like [ScriptRun](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrun?view=azure-ml-py), [Estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator?view=azure-ml-py), [HyperDrive](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py) and [Azure Machine Learning pipelines](how-to-create-your-first-pipeline.md).
25
25
@@ -36,6 +36,27 @@ To create and train with datasets, you need:
36
36
> [!Note]
37
37
> Some Dataset classes have dependencies on the [azureml-dataprep](https://docs.microsoft.com/python/api/azureml-dataprep/?view=azure-ml-py) package. For Linux users, these classes are supported only on the following distributions: Red Hat Enterprise Linux, Ubuntu, Fedora, and CentOS.
38
38
39
+
## Work with datasets
40
+
41
+
You can access existing datasets across experiments within your workspace, and load them into a pandas dataframe for further exploration on your local environment.
42
+
43
+
The following code uses the [`get_context()`]() method in the [`Run`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py) class to access the existing input TabularDataset, `titanic`, in the training script. Then uses the [`to_pandas_dataframe()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method to load that dataset into a pandas dataframe for further data exploration and preparation prior to training.
44
+
45
+
```Python
46
+
%%writefile $script_folder/train_titanic.py
47
+
48
+
from azureml.core import Dataset, Run
49
+
50
+
run = Run.get_context()
51
+
# get the input dataset by name
52
+
dataset = run.input_datasets['titanic']
53
+
54
+
# load the TabularDataset to pandas DataFrame
55
+
df = dataset.to_pandas_dataframe()
56
+
```
57
+
58
+
If you need to load the prepared data into a new dataset from an in memory pandas dataframe, write the data to a local file, like a parquet, and create a new dataset from that file.
59
+
39
60
## Use datasets directly in training scripts
40
61
41
62
If you have structured data, create a TabularDataset and use it directly in your training script for your local or remote experiment.
@@ -44,8 +65,6 @@ In this example, you create a [TabularDataset](https://docs.microsoft.com/python
44
65
45
66
### Create a TabularDataset
46
67
47
-
48
-
49
68
The following code creates an unregistered TabularDataset from a web url. You can also create datasets from local files or paths in datastores. Learn more about [how to create datasets](https://aka.ms/azureml/howto/createdatasets).
50
69
51
70
```Python
@@ -54,7 +73,8 @@ from azureml.core.dataset import Dataset
TabularDataset objects provide the ability to load the data into a pandas or spark DataFrame so that you can work with familiar data preparation and training libraries without having to leave your notebook. To leverage this capability, see [how to access input datasets](#access-input-datasets).
76
+
77
+
TabularDataset objects provide the ability to load the data into a pandas or spark DataFrame so that you can work with familiar data preparation and training libraries without having to leave your notebook. To leverage this capability, see [work with datasets](#work-with-datasets).
You can access and explore existing datasets across experiments within your workspace.
87
-
88
-
The following code uses the [`get_context()`]() method in the [`Run`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py) class to access the input TabularDataset, `titanic`, in the training script. Then uses the [`to_pandas_dataframe()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method to load that dataset into a pandas dataframe for further data exploration and preparation.
89
-
90
-
```Python
91
-
%%writefile $script_folder/train_titanic.py
92
-
93
-
from azureml.core import Dataset, Run
94
-
95
-
run = Run.get_context()
96
-
# get the input dataset by name
97
-
dataset = run.input_datasets['titanic']
98
-
99
-
# load the TabularDataset to pandas DataFrame
100
-
df = dataset.to_pandas_dataframe()
101
-
```
102
104
## Mount files to remote compute targets
103
105
104
106
If you have unstructured data, create a [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) and either mount or download your data files to make them available to your remote compute target for training. Learn about when to use [mount vs. download](#mount-vs.-download) for your remote training experiments.
0 commit comments