You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/service/concept-data.md
+18-23Lines changed: 18 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,62 +9,57 @@ ms.topic: conceptual
9
9
ms.reviewer: nibaccam
10
10
author: nibaccam
11
11
ms.author: nibaccam
12
-
ms.date: 11/27/2019
12
+
ms.date: 12/09/2019
13
13
14
14
---
15
15
16
16
# Data access in Azure Machine Learning
17
17
18
-
In this article, learn about Azure Machine Learning's data management and integration solutions for your machine learning tasks. This article describes a data access workflow that assumes you've already created an [Azure storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and [Azure storage service](https://docs.microsoft.com/azure/storage/common/storage-introduction).
19
-
18
+
In this article, learn about Azure Machine Learning's data management and integration solutions for your machine learning tasks. This article assumes you've already created an [Azure storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and [Azure storage service](https://docs.microsoft.com/azure/storage/common/storage-introduction).
20
19
21
20
When you're ready to use the data in your storage, we recommend you
22
21
23
22
1. Create an Azure Machine Learning datastore.
24
23
2. From that datastore, create an Azure Machine Learning dataset.
25
-
3. Use that dataset in your machine learning (ML) experiment by either
26
-
1. Mounting it to your ML experiment's compute target for model training
24
+
3. Use that dataset in your machine learning experiment by either
25
+
1. Mounting it to your experiment's compute target for model training
27
26
28
27
**OR**
29
28
30
-
1. Consuming it directly in Azure Machine Learning solutions like automated machine learning (automated ML) experiment runs, ML pipelines, and the designer.
29
+
1. Consuming it directly in Azure Machine Learning solutions like automated machine learning (automated ML) experiment runs, machine learning pipelines, and the designer.
31
30
4. Create dataset monitors for your model input and output datasets to detect for data drift.
32
31
5. If data drift is detected, retrain your model accordingly.
33
32
34
33
The following diagram provides a visual demonstration of this recommended data access workflow.
To access your data in your storage account, Azure Machine Learning offers datastores and datasets. Datastores provide a layer of abstraction over your storage service, this aids in security and ease of access to your storage, since connection information is kept in the datastore and not exposed in scripts. Datasets point to the specific file or files in your underlying storage that you want to use for your machine learning experiment. Together these offer a secure, scalable, and reproducible data delivery workflow for your machine learning tasks.
41
40
42
41
### Datastores
43
42
44
-
An Azure Machine Learning datastore is a storage abstraction over an Azure storage services account. Datastores allow you to easily connect to your Azure storage account, and access the data in your underlying Azure storage services. This ease of connection is facilitated by storing security information, like your subscription ID and token authorization, as part of the datastore object so you aren't hard coding that information in your scripts.
45
-
46
-
+[Register and create datastores](how-to-access-data.md)
43
+
An Azure Machine Learning datastore is a storage abstraction over an Azure storage services account. [Register and create a datastore](how-to-access-data.md) to easily connect to your Azure storage account, and access the data in your underlying Azure storage services.
47
44
48
45
### Datasets
49
46
50
-
Create an Azure Machine Learning dataset to interact with data in your datastores or to package your data into a consumable object for machine learning tasks.
47
+
[Create an Azure Machine Learning dataset](how-to-create-register-datasets.md) to interact with data in your datastores or to package your data into a consumable object for machine learning tasks. Register the dataset to your workspace to share and reuse it across different experiments without data ingestion complexities.
51
48
52
49
Datasets can be created from local files, public urls, [Azure Open Datasets](#open), or specific file(s) in your datastores. They aren't copies of your data, but are references that point to the data in your storage service, so no extra storage cost is incurred.
53
50
54
-
The following articles demonstrate additional datasets capabilities.
51
+
The following diagram shows that if you don't have an Azure storage service, you can create a dataset directly from local files, public urls, or an Azure Open Dataset. Doing so connects your dataset to the default datastore automatically created with your experiment's [Azure Machine Learning workspace](concept-workspace.md).
55
52
56
-
+[Create and register datasets](how-to-create-register-datasets.md) to your workspace to share and reuse it across different experiments without data ingestion complexities.
57
-
+[Version and track](how-to-version-track-datasets.md) dataset lineage.
58
-
+[Monitor your dataset](how-to-monitor-datasets.md) to help with data drift detection.
59
-
+ See the [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) and [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) class reference documentation for available data exploration methods.
Additional datasets capabilities can be found in the following articles.
62
56
63
-
There are two different types of datasets:
64
-
65
-
+[TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing. For a complete list of files you can create TabularDatasets from, see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
57
+
+[Version and track](how-to-version-track-datasets.md) dataset lineage.
58
+
+[Monitor your dataset](how-to-monitor-datasets.md) to help with data drift detection.
59
+
+ There are two different types of datasets:
60
+
+[TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. Which lets you materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing. For a complete list of files you can create TabularDatasets from, see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
66
61
67
-
+[FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can download or mount files of your choosing to your compute target as a FileDataset object.
62
+
+[FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can download or mount files of your choosing to your compute target as a FileDataset object.
68
63
69
64
## Work with your data
70
65
@@ -73,7 +68,7 @@ With datasets, you can accomplish a number of machine learning tasks through sea
+ the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
78
73
+ Create a [data labeling project](#label).
79
74
+ Set up a dataset monitor for [data drift](#drift) detection.
@@ -90,7 +85,7 @@ Azure Open Datasets include public-domain data for weather, census, holidays, pu
90
85
91
86
## Data labeling
92
87
93
-
Labeling large amounts of data has often been a headache in machine learning projects. ML projects with a computer vision component, such as image classification or object detection, generally require thousands of images and corresponding labels.
88
+
Labeling large amounts of data has often been a headache in machine learning projects. Machine learning projects with a computer vision component, such as image classification or object detection, generally require thousands of images and corresponding labels.
94
89
95
90
Azure Machine Learning gives you a central location to create, manage, and monitor labeling projects. Labeling projects help coordinate the data, labels, and team members, allowing you to more efficiently manage the labeling tasks. Currently supported tasks are image classification, either multi-label or multi-class, and object identification using bounded boxes.
0 commit comments