Skip to content

Commit b0eb80f

Browse files
committed
diagram and intro work
1 parent 045ef50 commit b0eb80f

File tree

2 files changed

+8
-18
lines changed

2 files changed

+8
-18
lines changed

articles/machine-learning/service/concept-data.md

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -15,30 +15,20 @@ ms.date: 11/25/2019
1515

1616
# Data in Azure Machine Learning
1717

18-
In this article, learn what Azure Machine learning offers for data storage and how across your machine learning experiments.
18+
In this article, learn about Azure Machine Learning's data integration solutions from data access to data drift.
1919

20-
Azure Machine Learning supports popular data file formats like, excel, parquet, etc. Keep your data in an Azure storage service, use a datastore to store the connection information and then create a dataset for training your machine learning models.
20+
The following diagram demonstrates the recommended data workflow for Azure Machine Learning. This article and workflow assumes you've already created an [ Azure storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and [service](https://docs.microsoft.com/azure/storage/common/storage-introduction).
2121

22-
## Where to store data
2322

24-
When you save your data in [Azure storage services](https://docs.microsoft.com/azure/storage/common/storage-introduction), you are storing your data in a scalable and secure cloud storage location.
25-
26-
Azure Storage includes these data services:
27-
28-
+ [Azure Blobs](https://docs.microsoft.com/azure/storage/blobs/storage-blobs-introduction): A massively scalable object store for text and binary data.
29-
+ [Azure Files](https://docs.microsoft.com/azure/storage/files/storage-files-introduction): Managed file shares for cloud or on-premises deployments.
30-
+ [Azure Queues](): A messaging store for reliable messaging between application components.
31-
+ [Azure Tables](https://docs.microsoft.com/azure/storage/tables/table-storage-overview): A NoSQL store for schemaless storage of structured data.
32-
33-
Each service is accessed through a storage account. To get started, see [Create a storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal).
23+
![Data-concept-diagram](media/concept-data/data-concept-diagram.png)
3424

3525
## Access data in storage
3626

3727
To access your data in storage, Azure Machine Learning offers datastores and datasets. These solutions allow you to access and reference your data without compromising security and ease of reuse.
3828

3929
### Datastores
4030

41-
An Azure datastore is a storage abstraction over an Azure Machine Learning storage account. Datastores allow you to easily access your data in Azure storage services by storing connection information, like your subscription ID and token authorization. This way you don't have to hard code that information in your scripts.
31+
An Azure Machine Learning datastore is a storage abstraction over an Azure storage services account. Datastores allow you to easily access your data in Azure storage services by storing connection information, like your subscription ID and token authorization. This way you don't have to hard code that information in your scripts.
4232

4333
+ [Register and create datastores](how-to-access-data.md)
4434

@@ -49,15 +39,15 @@ To interact with data in your datastores or to package your data into a consumab
4939
Create an unregistered dataset in memory for your local experiments, or register it to your workspace to share and reuse it across different machine learning experiments without worrying about data ingestion complexities.
5040

5141
+ [Create and register datasets](how-to-create-register-datasets.md)
52-
+ [Version and track](how-to-track-version-datasets.md) dataset lineage.
42+
+ [Version and track](how-to-version-track-datasets.md) dataset lineage.
5343

5444
#### Types of datasets
5545

5646
You can create a dataset from paths in datastores, pubic web urls, Azure Open Datasets and local files. Datasets provide you with the capability to do sampling, exploratory data analysis, and access data for machine learning experiments.
5747

5848
There are two different types of datasets
5949

60-
+ [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing.For a complete list of files you can create TabularDatasets from see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
50+
+ [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing. For a complete list of files you can create TabularDatasets from see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
6151

6252
+ [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can download or mount files of your choosing to your compute as a FileDataset object.
6353

@@ -66,8 +56,8 @@ There are two different types of datasets
6656
With datasets, you can accomplish a number of machine learning tasks through seamless integration with Azure Machine Learning features.
6757

6858
+ Create a [data labeling project](#label)
69-
+ [Train machine learning models with datasets](how-to-train-with-datasets.md).
70-
+ Consume datasets in [automated ML experiments](how-to-create-portal-experiments.md), [ML pipelines](how-to-create-your-first-pipeline.md) and the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
59+
+ [Mount or download your dataset for machine learning model training](how-to-train-with-datasets.md).
60+
+ Consume datasets in your [automated ML experiments](how-to-create-portal-experiments.md), [ML pipelines](how-to-create-your-first-pipeline.md) or the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
7161
+ Set up a dataset monitor for [data drift](#drift) detection.
7262

7363
<a name="open"></a>
17 KB
Loading

0 commit comments

Comments
 (0)