You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/service/concept-data.md
+46-20Lines changed: 46 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,59 +9,84 @@ ms.topic: conceptual
9
9
ms.reviewer: nibaccam
10
10
author: nibaccam
11
11
ms.author: nibaccam
12
-
ms.date: 11/20/2019
12
+
ms.date: 11/25/2019
13
13
14
14
---
15
15
16
16
# Data in Azure Machine Learning
17
17
18
-
In this article, learn how your data is accessed and utilized across your machine learning experiments in Azure Machine Learning.
18
+
In this article, learn what Azure Machine learning offers for data storage and how across your machine learning experiments.
19
19
20
+
Azure Machine Learning supports popular data file formats like, excel, parquet, etc. Keep your data in an Azure storage service, use a datastore to store the connection information and then create a dataset for training your machine learning models.
20
21
22
+
## Where to store data
21
23
22
-
## Access data in Azure data storage services
24
+
When you save your data in [Azure storage services](https://docs.microsoft.com/azure/storage/common/storage-introduction), you are storing your data in a scalable and secure cloud storage location.
23
25
24
-
When you save your data in [Azure storage services](https://docs.microsoft.com/azure/storage/common/storage-introduction), you are storing your data in a scalable and secure cloud storage location. To access your data in storage, Azure Machine Learning offers solutions like datastores and datasets that allow you to reference your data without compromising security and ease of reuse.
26
+
Azure Storage includes these data services:
27
+
28
+
+[Azure Blobs](https://docs.microsoft.com/azure/storage/blobs/storage-blobs-introduction): A massively scalable object store for text and binary data.
29
+
+[Azure Files](https://docs.microsoft.com/azure/storage/files/storage-files-introduction): Managed file shares for cloud or on-premises deployments.
30
+
+[Azure Queues](): A messaging store for reliable messaging between application components.
31
+
+[Azure Tables](https://docs.microsoft.com/azure/storage/tables/table-storage-overview): A NoSQL store for schemaless storage of structured data.
32
+
33
+
Each service is accessed through a storage account. To get started, see [Create a storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal).
34
+
35
+
## Access data in storage
36
+
37
+
To access your data in storage, Azure Machine Learning offers datastores and datasets. These solutions allow you to access and reference your data without compromising security and ease of reuse.
25
38
26
39
### Datastores
27
40
28
-
An Azure datastore is a storage abstraction over an Azure Machine Learning storage account. Datastores allow you to easily access your data in Azure storage services by storing connection information, like your subscription ID and token authorization, without you having to hard code that information in your scripts.
41
+
An Azure datastore is a storage abstraction over an Azure Machine Learning storage account. Datastores allow you to easily access your data in Azure storage services by storing connection information, like your subscription ID and token authorization. This way you don't have to hard code that information in your scripts.
29
42
30
43
+[Register and create datastores](how-to-access-data.md)
31
44
32
45
### Datasets
33
46
34
-
Azure Machine Learning datasets aren't copies of your data. When you create a dataset, you create a reference point to the data in your storage service, so no extra storage cost is incurred.
47
+
To interact with data in your datastores or to package your data into a consumable object for machine learning tasks, create an Azure Machine Learning dataset. Datasets can be created from local files, public urls, [Azure Open Dataset](#open), or datastores. They aren't copies of your data, but are references that point to the data in your storage service, so no extra storage cost is incurred.
35
48
36
49
Create an unregistered dataset in memory for your local experiments, or register it to your workspace to share and reuse it across different machine learning experiments without worrying about data ingestion complexities.
37
50
38
51
+[Create and register datasets](how-to-create-register-datasets.md)
39
-
40
-
#### What can we do with datasets?
41
-
42
-
With datasets, you can accomplish a number of machine learning tasks through seamless integration with Azure Machine Learning features.
43
-
44
-
+ Consume datasets in [automated ML experiments](how-to-create-portal-experiments.md), [ML pipelines](how-to-create-your-first-pipeline.md) and the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
45
-
+ Use datasets for a [data labeling project](how-to-create-labeling-projects.md)
46
-
+[Train machine learning models with datasets](how-to-train-with-datasets.md).
47
52
+[Version and track](how-to-track-version-datasets.md) dataset lineage.
48
-
+[Set up a dataset monitor](#drift) for data drift detection.
49
53
50
54
#### Types of datasets
51
55
52
56
You can create a dataset from paths in datastores, pubic web urls, Azure Open Datasets and local files. Datasets provide you with the capability to do sampling, exploratory data analysis, and access data for machine learning experiments.
53
57
54
58
There are two different types of datasets
55
59
56
-
*[TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing.For a complete list of files you can create TabularDatasets from see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
60
+
+[TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing.For a complete list of files you can create TabularDatasets from see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
61
+
62
+
+[FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can download or mount files of your choosing to your compute as a FileDataset object.
57
63
58
-
*[FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can download or mount files of your choosing to your compute as a FileDataset object.
64
+
## Work with your data
65
+
66
+
With datasets, you can accomplish a number of machine learning tasks through seamless integration with Azure Machine Learning features.
67
+
68
+
+ Create a [data labeling project](#label)
69
+
+[Train machine learning models with datasets](how-to-train-with-datasets.md).
70
+
+ Consume datasets in [automated ML experiments](how-to-create-portal-experiments.md), [ML pipelines](how-to-create-your-first-pipeline.md) and the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
71
+
+ Set up a dataset monitor for [data drift](#drift) detection.
72
+
73
+
<aname="open"></a>
59
74
60
75
## Azure Open Datasets
61
76
62
77
[Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for more accurate models. Open Datasets are in the cloud on Microsoft Azure and are integrated into Azure Machine Learning. You can also access the datasets through APIs and use them in other products, such as Power BI and Azure Data Factory.
63
78
64
-
Datasets include public-domain data for weather, census, holidays, public safety, and location that help you train machine learning models and enrich predictive solutions. You can also share your public datasets on Azure Open Datasets.
79
+
Azure Open Datasets include public-domain data for weather, census, holidays, public safety, and location that help you train machine learning models and enrich predictive solutions. You can also share your public datasets on Azure Open Datasets.
80
+
81
+
<aname="label"></a>
82
+
83
+
## Data labeling
84
+
85
+
Labeling large amounts of data has often been a headache in machine learning projects. ML projects with a computer vision component, such as image classification or object detection, generally require thousands of images and corresponding labels.
86
+
87
+
Azure Machine Learning gives you a central location to create, manage, and monitor labeling projects. Labeling projects help coordinate the data, labels, and team members, allowing you to more efficiently manage the labeling tasks. Currently supported tasks are image classification, either multi-label or multi-class, and object identification using bounded boxes.
88
+
89
+
+ Use datasets for a [data labeling project](how-to-create-labeling-projects.md).
65
90
66
91
<aname="drift"></a>
67
92
@@ -73,5 +98,6 @@ In the context of machine learning, data drift is the change in model input data
73
98
74
99
## Next steps
75
100
76
-
* For dataset training examples, see [sample notebooks](https://aka.ms/dataset-tutorial).
77
-
* For data drift examples, see this [data drift tutorial](https://aka.ms/datadrift-notebook).
101
+
+ Create a dataset in Azure Machine Learning studio or with the Python SDK, [use these steps.](how-to-create-register-datasets.md)
102
+
+ Try out dataset training examples with our [sample notebooks](https://aka.ms/dataset-tutorial).
103
+
+ For data drift examples, see this [data drift tutorial](https://aka.ms/datadrift-notebook).
0 commit comments