Skip to content

Commit b6b4cc4

Browse files
Merge pull request #127730 from nibaccam/concept-dsets
Data4ML | Edits to concept data
2 parents b22c4f4 + 432bb9f commit b6b4cc4

File tree

2 files changed

+15
-9
lines changed

2 files changed

+15
-9
lines changed

articles/machine-learning/concept-data.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
title: Secure data access in the cloud
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to securely connect to your data from Azure Machine Learning, and how to use datasets and datastores for ML tasks. Datastores can store data from an Azure Blob, Azure Data Lake Gen 1 & 2, SQL db, Databricks,...
4+
description: Learn how to securely connect to your data from Azure Machine Learning, and how to use datasets and datastores for ML tasks. Datastores can store data from an Azure Blob, Azure Data Lake Gen 1 & 2, SQL db, and Azure Databricks.
55
services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: core
88
ms.topic: conceptual
99
ms.reviewer: nibaccam
1010
author: nibaccam
1111
ms.author: nibaccam
12-
ms.date: 04/24/2020
12+
ms.date: 08/31/2020
1313
ms.custom: devx-track-python
1414

1515
# Customer intent: As an experienced Python developer, I need to securely access my data in my Azure storage solutions and use it to accomplish my machine learning tasks.
@@ -19,11 +19,11 @@ ms.custom: devx-track-python
1919

2020
Azure Machine Learning makes it easy to connect to your data in the cloud. It provides an abstraction layer over the underlying storage service, so you can securely access and work with your data without having to write code specific to your storage type. Azure Machine Learning also provides the following data capabilities:
2121

22+
* Interoperability with Pandas and Spark DataFrames
2223
* Versioning and tracking of data lineage
2324
* Data labeling
2425
* Data drift monitoring
25-
* Interoperability with Pandas and Spark DataFrames
26-
26+
2727
## Data workflow
2828

2929
When you're ready to use the data in your cloud-based storage solution, we recommend the following data delivery workflow. This workflow assumes you have an [Azure storage account](https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and data in a cloud-based storage service in Azure.
@@ -64,13 +64,19 @@ Supported cloud-based storage services in Azure that can be registered as datast
6464

6565
## Datasets
6666

67-
Azure Machine Learning datasets are references that point to the data in your storage service. They aren't copies of your data, so no extra storage cost is incurred and the integrity of your original data sources aren't at risk.
67+
Azure Machine Learning datasets are references that point to the data in your storage service. They aren't copies of your dataBy creating an Azure Machine Learning dataset, you create a reference to the data source location, along with a copy of its metadata.
68+
69+
Because datasets are lazily evaluated, and the data remains in its existing location, you
70+
71+
* Incur no extra storage cost.
72+
* Don't risk unintentionally changing your original data sources.
73+
* Improve ML workflow performance speeds.
6874

69-
To interact with your data in storage, [create a dataset](how-to-create-register-datasets.md) to package your data into a consumable object for machine learning tasks. Register the dataset to your workspace to share and reuse it across different experiments without data ingestion complexities.
75+
To interact with your data in storage, [create a dataset](how-to-create-register-datasets.md) to package your data into a consumable object for machine learning tasks. Register the dataset to your workspace to share and reuse it across different experiments without data ingestion complexities.
7076

71-
Datasets can be created from local files, public urls, [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/), or Azure storage services via datastores. To create a dataset from an in memory pandas dataframe, write the data to a local file, like a parquet, and create your dataset from that file.
77+
Datasets can be created from local files, public urls, [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/), or Azure storage services via datastores.
7278

73-
We support 2 types of datasets:
79+
There are 2 types of datasets:
7480

7581
+ A [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. If your data is already cleansed and ready to use in training experiments, you can [download or mount files](how-to-train-with-datasets.md#mount-files-to-remote-compute-targets) referenced by FileDatasets to your compute target.
7682

articles/machine-learning/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,7 @@
267267
- name: Access data
268268
items:
269269
- name: Connect to Azure Storage
270-
displayName: blob, get, fileshare, access, mount, download, data lake
270+
displayName: blob, get, fileshare, access, mount, download, data lake, datastore
271271
href: how-to-access-data.md
272272
- name: Get data from a datastore
273273
displayName: data, data set, register, access data

0 commit comments

Comments
 (0)