Skip to content

Commit cca6826

Browse files
committed
update spark+dask puP
1 parent 3cf0be7 commit cca6826

File tree

1 file changed

+5
-15
lines changed

1 file changed

+5
-15
lines changed

articles/machine-learning/how-to-create-register-datasets.md

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -328,9 +328,7 @@ titanic_ds.take(3).to_pandas_dataframe()
328328
1|2|True|1|Cumings, Mrs. John Bradley (Florence Briggs Th...|female|38.0|1|0|PC 17599|71.2833|C85|C
329329
2|3|True|3|Heikkinen, Miss. Laina|female|26.0|0|0|STON/O2. 3101282|7.9250||S
330330

331-
## Create a dataset from a dataframe
332-
333-
You can create and register TabularDatasets from a pandas or spark dataframe.
331+
## Create a dataset from pandas dataframe
334332

335333
To create a TabularDataset from an in memory pandas dataframe
336334
use the [`register_pandas_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactoryy#register-pandas-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) method. This method registers the TabularDataset to the workspace and uploads data to your underlying storage, which incurs storage costs.
@@ -345,18 +343,10 @@ datastore = Datastore.get(ws, '<name of your datastore>')
345343
dataset = Dataset.Tabular.register_pandas_dataframe(pandas_df, datastore, "dataset_from_pandas_df", show_progress=True)
346344

347345
```
348-
349-
You can also create a TabularDataset from a readily available spark dataframe with the
350-
[`register_spark_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#register-spark-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) method. This method registers the TabularDataset to the workspace and uploads data to your underlying storage, which incurs storage costs.
351-
352-
```python
353-
from azureml.core import Workspace, Datastore, Dataset
354-
355-
ws = Workspace.from_config()
356-
datastore = Datastore.get(ws, '<name of your datastore>')
357-
dataset = Dataset.Tabular.register_spark_dataframe(spark_df, datastore, "dataset_from_spark_df", show_progress=True)
358-
359-
```
346+
> [!TIP]
347+
> Create and register a TabularDataset from an in memory spark dataframe or a dask dataframe with the public preview methods, [`register_spark_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory##register-spark-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-) and [`register_dask_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#register-dask-dataframe-dataframe--target--name--description-none--tags-none--show-progress-true-). These methods are [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview features, and may change at any time.
348+
>
349+
> These methods upload data to your underlying storage, and as a result incur storage costs.
360350
361351
## Register datasets
362352

0 commit comments

Comments
 (0)