Merge pull request #127730 from nibaccam/concept-dsets

American-Dipper · web-flow · commit b6b4cc45237c · 2020-08-28T16:43:06.000-07:00
Data4ML | Edits to concept data
diff --git a/articles/machine-learning/concept-data.md b/articles/machine-learning/concept-data.md
@@ -1,15 +1,15 @@
 ---
 title: Secure data access in the cloud
 titleSuffix: Azure Machine Learning
-description: Learn how to securely connect to your data from Azure Machine Learning, and how to use datasets and datastores for ML tasks. Datastores can store data from an Azure Blob, Azure Data Lake Gen 1 & 2, SQL db, Databricks,...
+description: Learn how to securely connect to your data from Azure Machine Learning, and how to use datasets and datastores for ML tasks. Datastores can store data from an Azure Blob, Azure Data Lake Gen 1 & 2, SQL db, and Azure Databricks.
 services: machine-learning
 ms.service: machine-learning
 ms.subservice: core
 ms.topic: conceptual
 ms.reviewer: nibaccam
 author: nibaccam
 ms.author: nibaccam
-ms.date: 04/24/2020
+ms.date: 08/31/2020
 ms.custom: devx-track-python
 
 # Customer intent: As an experienced Python developer, I need to securely access my data in my Azure storage solutions and use it to accomplish my machine learning tasks.
@@ -19,11 +19,11 @@ ms.custom: devx-track-python
 
 Azure Machine Learning makes it easy to connect to your data in the cloud.  It provides an abstraction layer over the underlying storage service, so you can securely access and work with your data without having to write code specific to your storage type. Azure Machine Learning also provides the following data capabilities:
 
+*    Interoperability with Pandas and Spark DataFrames
 *    Versioning and tracking of data lineage
 *    Data labeling 
 *    Data drift monitoring
-*    Interoperability with Pandas and Spark DataFrames
-
+    
 ## Data workflow
 
 When you're ready to use the data in your cloud-based storage solution, we recommend the following data delivery workflow. This workflow assumes you have an [Azure storage account](https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and data in a cloud-based storage service in Azure. 
@@ -64,13 +64,19 @@ Supported cloud-based storage services in Azure that can be registered as datast
 
 ## Datasets
 
-Azure Machine Learning datasets are references that point to the data in your storage service. They aren't copies of your data, so no extra storage cost is incurred and the integrity of your original data sources aren't at risk.
+Azure Machine Learning datasets are references that point to the data in your storage service. They aren't copies of your dataBy creating an Azure Machine Learning dataset, you create a reference to the data source location, along with a copy of its metadata. 
+
+Because datasets are lazily evaluated, and the data remains in its existing location, you
+
+* Incur no extra storage cost.
+* Don't risk unintentionally changing your original data sources.
+* Improve ML workflow performance speeds.
 
- To interact with your data in storage, [create a dataset](how-to-create-register-datasets.md) to package your data into a consumable object for machine learning tasks. Register the dataset to your workspace to share and reuse it across different experiments without data ingestion complexities.
+To interact with your data in storage, [create a dataset](how-to-create-register-datasets.md) to package your data into a consumable object for machine learning tasks. Register the dataset to your workspace to share and reuse it across different experiments without data ingestion complexities.
 
-Datasets can be created from local files, public urls, [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/), or Azure storage services via datastores. To create a dataset from an in memory pandas dataframe, write the data to a local file, like a parquet, and create your dataset from that file.  
+Datasets can be created from local files, public urls, [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/), or Azure storage services via datastores. 
 
-We support 2 types of datasets: 
+There are 2 types of datasets: 
 
 + A [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. If your data is already cleansed and ready to use in training experiments, you can [download or mount files](how-to-train-with-datasets.md#mount-files-to-remote-compute-targets) referenced by FileDatasets to your compute target.
 
diff --git a/articles/machine-learning/toc.yml b/articles/machine-learning/toc.yml
@@ -267,7 +267,7 @@
         - name: Access data
           items: 
             - name: Connect to Azure Storage
-              displayName: blob, get, fileshare, access, mount, download, data lake
+              displayName: blob, get, fileshare, access, mount, download, data lake, datastore
               href: how-to-access-data.md
             - name: Get data from a datastore
               displayName: data, data set, register, access data