You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/service/concept-data.md
+8-18Lines changed: 8 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,30 +15,20 @@ ms.date: 11/25/2019
15
15
16
16
# Data in Azure Machine Learning
17
17
18
-
In this article, learn what Azure Machine learning offers for data storage and how across your machine learning experiments.
18
+
In this article, learn about Azure Machine Learning's data integration solutions from data access to data drift.
19
19
20
-
Azure Machine Learning supports popular data file formats like, excel, parquet, etc. Keep your data in an Azure storage service, use a datastore to store the connection information and then create a dataset for training your machine learning models.
20
+
The following diagram demonstrates the recommended data workflow for Azure Machine Learning. This article and workflow assumes you've already created an [ Azure storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and [service](https://docs.microsoft.com/azure/storage/common/storage-introduction).
21
21
22
-
## Where to store data
23
22
24
-
When you save your data in [Azure storage services](https://docs.microsoft.com/azure/storage/common/storage-introduction), you are storing your data in a scalable and secure cloud storage location.
25
-
26
-
Azure Storage includes these data services:
27
-
28
-
+[Azure Blobs](https://docs.microsoft.com/azure/storage/blobs/storage-blobs-introduction): A massively scalable object store for text and binary data.
29
-
+[Azure Files](https://docs.microsoft.com/azure/storage/files/storage-files-introduction): Managed file shares for cloud or on-premises deployments.
30
-
+[Azure Queues](): A messaging store for reliable messaging between application components.
31
-
+[Azure Tables](https://docs.microsoft.com/azure/storage/tables/table-storage-overview): A NoSQL store for schemaless storage of structured data.
32
-
33
-
Each service is accessed through a storage account. To get started, see [Create a storage account](https://docs.microsoft.comazure/storage/common/storage-quickstart-create-account?tabs=azure-portal).
To access your data in storage, Azure Machine Learning offers datastores and datasets. These solutions allow you to access and reference your data without compromising security and ease of reuse.
38
28
39
29
### Datastores
40
30
41
-
An Azure datastore is a storage abstraction over an Azure Machine Learning storage account. Datastores allow you to easily access your data in Azure storage services by storing connection information, like your subscription ID and token authorization. This way you don't have to hard code that information in your scripts.
31
+
An Azure Machine Learning datastore is a storage abstraction over an Azure storage services account. Datastores allow you to easily access your data in Azure storage services by storing connection information, like your subscription ID and token authorization. This way you don't have to hard code that information in your scripts.
42
32
43
33
+[Register and create datastores](how-to-access-data.md)
44
34
@@ -49,15 +39,15 @@ To interact with data in your datastores or to package your data into a consumab
49
39
Create an unregistered dataset in memory for your local experiments, or register it to your workspace to share and reuse it across different machine learning experiments without worrying about data ingestion complexities.
50
40
51
41
+[Create and register datasets](how-to-create-register-datasets.md)
52
-
+[Version and track](how-to-track-version-datasets.md) dataset lineage.
42
+
+[Version and track](how-to-version-track-datasets.md) dataset lineage.
53
43
54
44
#### Types of datasets
55
45
56
46
You can create a dataset from paths in datastores, pubic web urls, Azure Open Datasets and local files. Datasets provide you with the capability to do sampling, exploratory data analysis, and access data for machine learning experiments.
57
47
58
48
There are two different types of datasets
59
49
60
-
+[TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing.For a complete list of files you can create TabularDatasets from see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
50
+
+[TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame for further manipulation and cleansing.For a complete list of files you can create TabularDatasets from see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
61
51
62
52
+[FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. By this method, you can download or mount files of your choosing to your compute as a FileDataset object.
63
53
@@ -66,8 +56,8 @@ There are two different types of datasets
66
56
With datasets, you can accomplish a number of machine learning tasks through seamless integration with Azure Machine Learning features.
67
57
68
58
+ Create a [data labeling project](#label)
69
-
+[Train machine learning models with datasets](how-to-train-with-datasets.md).
70
-
+Consume datasets in [automated ML experiments](how-to-create-portal-experiments.md), [ML pipelines](how-to-create-your-first-pipeline.md)and the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
59
+
+[Mount or download your dataset for machine learning model training](how-to-train-with-datasets.md).
60
+
+ Consume datasets in your [automated ML experiments](how-to-create-portal-experiments.md), [ML pipelines](how-to-create-your-first-pipeline.md)or the [designer](tutorial-designer-automobile-price-train-score.md#import-data)
71
61
+ Set up a dataset monitor for [data drift](#drift) detection.
0 commit comments