You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-create-register-datasets.md
+27-36Lines changed: 27 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,6 +117,8 @@ Use the [`from_files()`](/python/api/azureml-core/azureml.data.dataset_factory.f
117
117
If your storage is behind a virtual network or firewall, set the parameter `validate=False` in your `from_files()` method. This bypasses the initial validation step, and ensures that you can create your dataset from these secure files. Learn more about how to [use datastores and datasets in a virtual network](how-to-secure-workspace-vnet.md#datastores-and-datasets).
118
118
119
119
```Python
120
+
from azureml.core import Workspace, Datastore, Dataset
121
+
120
122
# create a FileDataset pointing to files in 'animals' folder and its subfolders recursively
To reuse and share datasets across experiment in your workspace, [register your dataset](#register-datasets).
130
131
131
-
> [!TIP]
132
-
> Upload files from a local directory and create a FileDataset in a single method with the public preview method, [upload_directory()](/python/api/azureml-core/azureml.data.dataset_factory.filedatasetfactory#upload-directory-src-dir--target--pattern-none--overwrite-false--show-progress-true-). This method is an [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview feature, and may change at any time.
133
-
>
134
-
> This method uploads data to your underlying storage, and as a result incur storage costs.
132
+
If you want to upload all the files from a local directory, create a FileDataset in a single method with [upload_directory()](/python/api/azureml-core/azureml.data.dataset_factory.filedatasetfactory#upload-directory-src-dir--target--pattern-none--overwrite-false--show-progress-true-). This method uploads data to your underlying storage, and as a result incur storage costs.
133
+
134
+
```Python
135
+
from azureml.core import Workspace, Datastore, Dataset
136
+
from azureml.data.datapath import DataPath
137
+
138
+
ws = Workspace.from_config()
139
+
datastore = Datastore.get(ws, '<name of your datastore>')
140
+
ds = Dataset.File.upload_directory(src_dir='<path to you data>',
141
+
target=DataPath(datastore, '<path on the datastore>'),
142
+
show_progress=True)
143
+
144
+
```
145
+
146
+
To reuse and share datasets across experiment in your workspace, [register your dataset](#register-datasets).
To create a TabularDataset from an in memory pandas dataframe, write the data to a local file, like a csv, and create your dataset from that file. The following code demonstrates this workflow.
333
+
To create a TabularDataset from an in memory pandas dataframe
334
+
use the [`register_pandas_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactoryy#register-pandas-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) method. This method registers the TabularDataset to the workspace and uploads data to your underlying storage, which incurs storage costs.
322
335
323
336
```python
324
-
# azureml-core of version 1.0.72 or higher is required
325
-
# azureml-dataprep[pandas] of version 1.1.34 or higher is required
326
-
327
-
from azureml.core import Workspace, Dataset
328
-
local_path ='data/prepared.csv'
329
-
dataframe.to_csv(local_path)
330
-
331
-
# upload the local file to a datastore on the cloud
> Create and register a TabularDataset from an in memory spark or pandas dataframe with a single method with public preview methods, [`register_spark_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#methods) and [`register_pandas_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#methods). These register methods are [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview features, and may change at any time.
347
+
> Create and register a TabularDataset from an in memory spark dataframe or a dask dataframe with the public preview methods, [`register_spark_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory##register-spark-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) and [`register_dask_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#register-dask-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-). These methods are [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview features, and may change at any time.
351
348
>
352
349
> These methods upload data to your underlying storage, and as a result incur storage costs.
There are many templates at [https://github.com/Azure/azure-quickstart-templates/tree/master//quickstarts/microsoft.machinelearningservices](https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.machinelearningservices) that can be used to create datasets.
367
364
368
365
For information on using these templates, see [Use an Azure Resource Manager template to create a workspace for Azure Machine Learning](how-to-create-workspace-template.md).
369
-
370
-
371
-
## Create datasets from Azure Open Datasets
372
-
373
-
[Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for more accurate models. Datasets include public-domain data for weather, census, holidays, public safety, and location that help you train machine learning models and enrich predictive solutions. Open Datasets are in the cloud on Microsoft Azure and are included in both the SDK and the studio.
374
-
375
-
Learn how to create [Azure Machine Learning Datasets from Azure Open Datasets](../open-datasets/how-to-create-azure-machine-learning-dataset-from-open-dataset.md).
0 commit comments