Skip to content

Commit fac85d0

Browse files
authored
Merge pull request #108633 from MayMSFT/patch-35
Update concept-data.md
2 parents dce77af + 830f30d commit fac85d0

File tree

1 file changed

+9
-13
lines changed

1 file changed

+9
-13
lines changed

articles/machine-learning/concept-data.md

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -65,33 +65,29 @@ Supported cloud-based storage services in Azure that can be registered as datast
6565

6666
Azure Machine Learning datasets are references that point to the data in your storage service. They aren't copies of your data, so no extra storage cost is incurred. To interact with your data in storage, [create a dataset](how-to-create-register-datasets.md) to package your data into a consumable object for machine learning tasks. Register the dataset to your workspace to share and reuse it across different experiments without data ingestion complexities.
6767

68-
Datasets can be created from local files, public urls, [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/), or specific file(s) in your datastores. To create a dataset from an in memory pandas dataframe, write the data to a local file, like a csv, and create your dataset from that file.
68+
Datasets can be created from local files, public urls, [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/), or Azure storage services via datastores. To create a dataset from an in memory pandas dataframe, write the data to a local file, like a parquet, and create your dataset from that file.
6969

70-
The following diagram shows that if you don't have an Azure storage service, you can create a dataset directly from local files, public urls, or an Azure Open Dataset. Doing so connects your dataset to the default datastore that was automatically created with your experiment's [Azure Machine Learning workspace](concept-workspace.md).
70+
We support 2 types of datasets:
71+
+ A [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. You can load a TabularDataset into a Pandas or Spark DataFrame for further manipulation and cleansing. For a complete list of data formats you can create TabularDatasets from, see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
7172

72-
![Data-concept-diagram](./media/concept-data/dataset-workflow.svg)
73+
+ A [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. You can [download or mount files](how-to-train-with-datasets.md#option-2--mount-files-to-a-remote-compute-target) referenced by filedatasets to your compute target.
7374

7475
Additional datasets capabilities can be found in the following documentation:
7576

7677
+ [Version and track](how-to-version-track-datasets.md) dataset lineage.
77-
+ [Monitor your dataset](how-to-monitor-datasets.md) to help with data drift detection.
78-
+ See the following for documentation on the two types of datasets:
79-
+ A [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. Which lets you materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing. For a complete list of files you can create TabularDatasets from, see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
80-
81-
+ A [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can [download or mount files](how-to-train-with-datasets.md#option-2--mount-files-to-a-remote-compute-target) of your choosing to your compute target as a FileDataset object.
78+
+ [Monitor your dataset](how-to-monitor-datasets.md) to help with data drift detection.
8279

8380
## Work with your data
8481

8582
With datasets, you can accomplish a number of machine learning tasks through seamless integration with Azure Machine Learning features.
8683

8784
+ Create a [data labeling project](#label).
88-
+ Create a dataset from an [Azure Open Dataset](how-to-create-register-datasets.md#create-datasets-with-azure-open-datasets).
89-
+ [Train machine learning models](how-to-train-with-datasets.md).
90-
+ Consume datasets in
85+
+ Train machine learning models:
9186
+ [automated ML experiments](how-to-use-automated-ml-for-ml-models.md)
92-
+ the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
87+
+ the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
88+
+ [notebooks](how-to-train-with-datasets.md)
9389
+ [Azure Machine Learning pipelines](how-to-create-your-first-pipeline.md)
94-
+ Access datasets for scoring with batch inference in [machine learning pipelines](how-to-create-your-first-pipeline.md).
90+
+ Access datasets for scoring with [batch inference](how-to-use-parallel-run-step.md) in [machine learning pipelines](how-to-create-your-first-pipeline.md).
9591
+ Set up a dataset monitor for [data drift](#drift) detection.
9692

9793
<a name="label"></a>

0 commit comments

Comments
 (0)