You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-create-register-datasets.md
+27-3Lines changed: 27 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,6 +117,8 @@ Use the [`from_files()`](/python/api/azureml-core/azureml.data.dataset_factory.f
117
117
If your storage is behind a virtual network or firewall, set the parameter `validate=False` in your `from_files()` method. This bypasses the initial validation step, and ensures that you can create your dataset from these secure files. Learn more about how to [use datastores and datasets in a virtual network](how-to-secure-workspace-vnet.md#datastores-and-datasets).
118
118
119
119
```Python
120
+
from azureml.core import Workspace, Datastore, Dataset
121
+
120
122
# create a FileDataset pointing to files in 'animals' folder and its subfolders recursively
If you want to upload all the files from a local directory, create a FileDataset in a single method with [upload_directory()](/python/api/azureml-core/azureml.data.dataset_factory.filedatasetfactory#upload-directory-src-dir--target--pattern-none--overwrite-false--show-progress-true-). This method uploads data to your underlying storage, and as a result incur storage costs.
131
133
132
134
```Python
135
+
from azureml.core import Workspace, Datastore, Dataset
136
+
from azureml.data.datapath import DataPath
137
+
138
+
ws = Workspace.from_config()
139
+
datastore = Datastore.get(ws, '<name of your datastore>')
140
+
ds = Dataset.File.upload_directory(src_dir='<path to you data>',
141
+
target=DataPath(datastore, '<path on the datastore>'),
You can create and register TabularDatasets from a pandas or spark dataframe.
324
334
325
335
To create a TabularDataset from an in memory pandas dataframe
326
-
use the [`register_pandas_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactoryy#register-pandas-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) method. This method registers the TabularDataset to the workspace and uploads data to your underlying storage.
336
+
use the [`register_pandas_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactoryy#register-pandas-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) method. This method registers the TabularDataset to the workspace and uploads data to your underlying storage, which incurs storage costs.
327
337
328
338
```python
339
+
from azureml.core import Workspace, Datastore, Dataset
340
+
import pandas as pd
341
+
342
+
pandas_df = pd.read_csv('<path to your csv file>')
343
+
ws = Workspace.from_config()
344
+
datastore = Datastore.get(ws, '<name of your datastore>')
You can also create a TabularDataset from a spark dataframe with the
332
-
[`register_spark_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#register-spark-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) method. This method registers the TabularDataset to the workspace and uploads data to your underlying storage.
349
+
You can also create a TabularDataset from a readily available spark dataframe with the
350
+
[`register_spark_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#register-spark-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) method. This method registers the TabularDataset to the workspace and uploads data to your underlying storage, which incurs storage costs.
333
351
334
352
```python
353
+
from azureml.core import Workspace, Datastore, Dataset
354
+
355
+
ws = Workspace.from_config()
356
+
datastore = Datastore.get(ws, '<name of your datastore>')
0 commit comments