Skip to content

Commit 14a5f38

Browse files
committed
diagram updates
1 parent 7d14d2b commit 14a5f38

File tree

4 files changed

+682
-23
lines changed

4 files changed

+682
-23
lines changed

articles/machine-learning/service/concept-data.md

Lines changed: 18 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,62 +9,57 @@ ms.topic: conceptual
99
ms.reviewer: nibaccam
1010
author: nibaccam
1111
ms.author: nibaccam
12-
ms.date: 11/27/2019
12+
ms.date: 12/09/2019
1313

1414
---
1515

1616
# Data access in Azure Machine Learning
1717

18-
In this article, learn about Azure Machine Learning's data management and integration solutions for your machine learning tasks. This article describes a data access workflow that assumes you've already created an [Azure storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and [Azure storage service](https://docs.microsoft.com/azure/storage/common/storage-introduction).
19-
18+
In this article, learn about Azure Machine Learning's data management and integration solutions for your machine learning tasks. This article assumes you've already created an [Azure storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and [Azure storage service](https://docs.microsoft.com/azure/storage/common/storage-introduction).
2019

2120
When you're ready to use the data in your storage, we recommend you
2221

2322
1. Create an Azure Machine Learning datastore.
2423
2. From that datastore, create an Azure Machine Learning dataset.
25-
3. Use that dataset in your machine learning (ML) experiment by either
26-
1. Mounting it to your ML experiment's compute target for model training
24+
3. Use that dataset in your machine learning experiment by either
25+
1. Mounting it to your experiment's compute target for model training
2726

2827
**OR**
2928

30-
1. Consuming it directly in Azure Machine Learning solutions like automated machine learning (automated ML) experiment runs, ML pipelines, and the designer.
29+
1. Consuming it directly in Azure Machine Learning solutions like automated machine learning (automated ML) experiment runs, machine learning pipelines, and the designer.
3130
4. Create dataset monitors for your model input and output datasets to detect for data drift.
3231
5. If data drift is detected, retrain your model accordingly.
3332

3433
The following diagram provides a visual demonstration of this recommended data access workflow.
3534

36-
![Data-concept-diagram](media/concept-data/data-concept-diagram.png)
35+
![Data-concept-diagram](media/concept-data/data-concept-diagram.svg)
3736

3837
## Access data in storage
3938

4039
To access your data in your storage account, Azure Machine Learning offers datastores and datasets. Datastores provide a layer of abstraction over your storage service, this aids in security and ease of access to your storage, since connection information is kept in the datastore and not exposed in scripts. Datasets point to the specific file or files in your underlying storage that you want to use for your machine learning experiment. Together these offer a secure, scalable, and reproducible data delivery workflow for your machine learning tasks.
4140

4241
### Datastores
4342

44-
An Azure Machine Learning datastore is a storage abstraction over an Azure storage services account. Datastores allow you to easily connect to your Azure storage account, and access the data in your underlying Azure storage services. This ease of connection is facilitated by storing security information, like your subscription ID and token authorization, as part of the datastore object so you aren't hard coding that information in your scripts.
45-
46-
+ [Register and create datastores](how-to-access-data.md)
43+
An Azure Machine Learning datastore is a storage abstraction over an Azure storage services account. [Register and create a datastore](how-to-access-data.md) to easily connect to your Azure storage account, and access the data in your underlying Azure storage services.
4744

4845
### Datasets
4946

50-
Create an Azure Machine Learning dataset to interact with data in your datastores or to package your data into a consumable object for machine learning tasks.
47+
[Create an Azure Machine Learning dataset](how-to-create-register-datasets.md) to interact with data in your datastores or to package your data into a consumable object for machine learning tasks. Register the dataset to your workspace to share and reuse it across different experiments without data ingestion complexities.
5148

5249
Datasets can be created from local files, public urls, [Azure Open Datasets](#open), or specific file(s) in your datastores. They aren't copies of your data, but are references that point to the data in your storage service, so no extra storage cost is incurred.
5350

54-
The following articles demonstrate additional datasets capabilities.
51+
The following diagram shows that if you don't have an Azure storage service, you can create a dataset directly from local files, public urls, or an Azure Open Dataset. Doing so connects your dataset to the default datastore automatically created with your experiment's [Azure Machine Learning workspace](concept-workspace.md).
5552

56-
+ [Create and register datasets](how-to-create-register-datasets.md) to your workspace to share and reuse it across different experiments without data ingestion complexities.
57-
+ [Version and track](how-to-version-track-datasets.md) dataset lineage.
58-
+ [Monitor your dataset](how-to-monitor-datasets.md) to help with data drift detection.
59-
+ See the [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) and [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) class reference documentation for available data exploration methods.
53+
![Data-concept-diagram](media/concept-data/dataset-workflow.svg)
6054

61-
#### Types of datasets
55+
Additional datasets capabilities can be found in the following articles.
6256

63-
There are two different types of datasets:
64-
65-
+ [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing. For a complete list of files you can create TabularDatasets from, see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
57+
+ [Version and track](how-to-version-track-datasets.md) dataset lineage.
58+
+ [Monitor your dataset](how-to-monitor-datasets.md) to help with data drift detection.
59+
+ There are two different types of datasets:
60+
+ [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. Which lets you materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing. For a complete list of files you can create TabularDatasets from, see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
6661

67-
+ [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can download or mount files of your choosing to your compute target as a FileDataset object.
62+
+ [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can download or mount files of your choosing to your compute target as a FileDataset object.
6863

6964
## Work with your data
7065

@@ -73,7 +68,7 @@ With datasets, you can accomplish a number of machine learning tasks through sea
7368
+ [Train machine learning models](how-to-train-with-datasets.md).
7469
+ Consume datasets in
7570
+ [automated ML experiments](how-to-create-portal-experiments.md)
76-
+ [ML pipelines](how-to-create-your-first-pipeline.md)
71+
+ [machine learning pipelines](how-to-create-your-first-pipeline.md)
7772
+ the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
7873
+ Create a [data labeling project](#label).
7974
+ Set up a dataset monitor for [data drift](#drift) detection.
@@ -90,7 +85,7 @@ Azure Open Datasets include public-domain data for weather, census, holidays, pu
9085

9186
## Data labeling
9287

93-
Labeling large amounts of data has often been a headache in machine learning projects. ML projects with a computer vision component, such as image classification or object detection, generally require thousands of images and corresponding labels.
88+
Labeling large amounts of data has often been a headache in machine learning projects. Machine learning projects with a computer vision component, such as image classification or object detection, generally require thousands of images and corresponding labels.
9489

9590
Azure Machine Learning gives you a central location to create, manage, and monitor labeling projects. Labeling projects help coordinate the data, labels, and team members, allowing you to more efficiently manage the labeling tasks. Currently supported tasks are image classification, either multi-label or multi-class, and object identification using bounded boxes.
9691

0 Bytes
Loading

0 commit comments

Comments
 (0)