Skip to content

Commit 3e800a7

Browse files
committed
concept refresh
1 parent 7311f99 commit 3e800a7

File tree

4 files changed

+17
-17
lines changed

4 files changed

+17
-17
lines changed

articles/machine-learning/concept-data.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ ms.topic: conceptual
99
ms.reviewer: nibaccam
1010
author: nibaccam
1111
ms.author: nibaccam
12-
ms.date: 03/09/2020
12+
ms.date: 03/15/2020
1313

1414
---
1515

1616
# Data access in Azure Machine Learning
1717

1818
In this article, you learn about Azure Machine Learning's data management and integration solutions for your machine learning tasks. This article assumes you've already created an [Azure storage account](https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal) and [Azure storage service](https://docs.microsoft.com/azure/storage/common/storage-introduction).
1919

20-
When you're ready to use the data in your storage, we recommend you
20+
When you're ready to use the data in your Azure storage solution, we recommend you
2121

2222
1. Create an Azure Machine Learning datastore.
2323
2. From that datastore, create an Azure Machine Learning dataset.
@@ -36,13 +36,13 @@ The following diagram provides a visual demonstration of this recommended data a
3636

3737
## Access data in storage
3838

39-
To access your data in your storage account, Azure Machine Learning offers datastores and datasets. Datastores answer the question: how do I securely connect to my data that's in my Azure Storage? Datastores provide a layer of abstraction over your storage service. This aids in security and ease of access to your storage, since connection information is kept in the datastore and not exposed in scripts.
39+
To access your data in your storage account, Azure Machine Learning offers datastores and datasets. Datastores answer the question: how do I securely connect to my data that's in my Azure Storage? Datastores save the connection information to your Azure Storage. This aids in security and ease of access to your storage, since connection information is kept in the datastore and not exposed in scripts.
4040

4141
Datasets answer the question: how do I get specific data files in my datastore? Datasets point to the specific file or files in your underlying storage that you want to use for your machine learning experiment. Together, datastores and datasets offer a secure, scalable, and reproducible data delivery workflow for your machine learning tasks.
4242

4343
## Datastores
4444

45-
An Azure Machine Learning datastore is a storage abstraction over your Azure storage services. [Register and create a datastore](how-to-access-data.md) to easily connect to your Azure storage account, and access the data in your underlying Azure storage services.
45+
An Azure Machine Learning datastore keeps the connection information to your storage so you don't have to code it in your scripts. [Register and create a datastore](how-to-access-data.md) to easily connect to your Azure storage account, and access the data in your underlying Azure storage services.
4646

4747
Supported Azure storage services that can be registered as datastores:
4848
+ Azure Blob Container

articles/machine-learning/how-to-access-data.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,14 @@ ms.custom: seodec18
1919
# Access data in Azure storage services
2020
[!INCLUDE [aml-applies-to-basic-enterprise-sku](../../includes/aml-applies-to-basic-enterprise-sku.md)]
2121

22-
In this article, learn how to easily access your data in Azure Storage services via Azure Machine Learning datastores. Datastores are used to store connection information, like your subscription ID and token authorization. When you use datastores, you can access your storage without having to hard code connection information in your scripts.
22+
In this article, learn how to easily access your data in Azure Storage services via Azure Machine Learning datastores. Datastores store connection information, like your subscription ID and token authorization, so you can access your storage without having to hard code them in your scripts.
2323

2424
You can create datastores from [these Azure Storage solutions](#matrix). For unsupported storage solutions, and to save data egress cost during machine learning experiments, we recommend that you [move your data](#move) to supported Azure Storage solutions.
2525

2626
## Prerequisites
27+
2728
You'll need:
28-
- An Azure subscription. If you dont have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
29+
- An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
2930

3031
- An Azure storage account with an [Azure blob container](https://docs.microsoft.com/azure/storage/blobs/storage-blobs-overview) or [Azure file share](https://docs.microsoft.com/azure/storage/files/storage-files-introduction).
3132

@@ -59,12 +60,11 @@ Azure Database for MySQL | SQL authentication| | ✓* | ✓* |
5960
Databricks File System| No authentication | | ✓** | ✓ ** |✓**
6061

6162
*MySQL is only supported for pipeline [DataTransferStep](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.datatransferstep?view=azure-ml-py). <br>
62-
\**Databricks is only supported for pipeline [DatabricksStep](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py)
63+
**Databricks is only supported for pipeline [DatabricksStep](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py)
6364

6465
### Storage guidance
6566

66-
We recommend creating a datastore for an Azure blob container.
67-
Both standard and premium storage are available for blobs. Although premium storage is more expensive, its faster throughput speeds might improve the speed of your training runs, particularly if you train against a large dataset. For information about the cost of storage accounts, see the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/?service=machine-learning-service).
67+
We recommend creating a datastore for an Azure blob container. Both standard and premium storage are available for blobs. Although premium storage is more expensive, its faster throughput speeds might improve the speed of your training runs, particularly if you train against a large dataset. For information about the cost of storage accounts, see the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/?service=machine-learning-service).
6868

6969
When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace. They're named `workspaceblobstore` and `workspacefilestore`, respectively. They store the connection information for the blob container and the file share that are provisioned in the storage account attached to the workspace. The `workspaceblobstore` container is set as the default datastore.
7070

@@ -75,9 +75,9 @@ When you create a workspace, an Azure blob container and an Azure file share are
7575
When you register an Azure Storage solution as a datastore, you automatically create and register that datastore to a specific workspace. You can create and register datastores to a workspace by using the Python SDK or Azure Machine Learning studio.
7676

7777
>[!IMPORTANT]
78-
> As part of the current datastore create and register process, Azure Machine Learning validates that the user provided principal (username, service principal or SAS token) has access to the underlying storage service.
78+
> As part of the initial datastore create and register process, Azure Machine Learning validates that the underlying storage service exists and that the user provided principal (username, service principal or SAS token) has access to that storage. For Azure Data Lake Storage Gen 1 and 2 datastores, however, this validation happens later, when data access methods like [`from_files()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.filedatasetfactory?view=azure-ml-py) or [`from_delimited_files()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-parquet-files-path--validate-true--include-path-false--set-column-types-none--partition-format-none-) are called.
7979
<br><br>
80-
However, for Azure Data Lake Storage Gen 1 and 2 datastores, this validation happens later when data access methods like [`from_files()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.filedatasetfactory?view=azure-ml-py) or [`from_delimited_files()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-parquet-files-path--validate-true--include-path-false--set-column-types-none--partition-format-none-) are called.
80+
This validation is only performed **once** and is **not** repeated thereafter; for example, each time the datastore is called in scripts.
8181

8282
### Python SDK
8383

@@ -93,7 +93,7 @@ Select **Storage Accounts** on the left pane, and choose the storage account tha
9393
> [!IMPORTANT]
9494
> If your storage account is in a virtual network, only creation of Blob, File share, ADLS Gen 1 and ADLS Gen 2 datastores **via the SDK** is supported. To grant your workspace access to your storage account, set the parameter `grant_workspace_access` to `True`.
9595
96-
The following examples show how to register an Azure blob container, an Azure file share, and Azure Data Lake Storage Generation 2 as a datastore. For other storage services, please see the [reference documentation for the `register_azure_*` methods](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py#methods).
96+
The following examples show how to register an Azure blob container, an Azure file share, and Azure Data Lake Storage Generation 2 as a datastore. For other storage services, please see the [reference documentation for the applicable `register_azure_*` methods](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py#methods).
9797

9898
#### Blob container
9999

@@ -260,7 +260,7 @@ To interact with data in your datastores or to package your data into a consumab
260260

261261
Azure Blob storage has higher throughput speeds than an Azure file share and will scale to large numbers of jobs started in parallel. For this reason, we recommend configuring your runs to use Blob storage for transferring source code files.
262262

263-
The following code example specifies in the run configuration which blob datastore to use for source code transfers:
263+
The following code example specifies in the run configuration which blob datastore to use for source code transfers.
264264

265265
```python
266266
# workspaceblobstore is the default blob storage

articles/machine-learning/how-to-create-register-datasets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ With Azure Machine Learning datasets, you can:
3434

3535
To create and work with datasets, you need:
3636

37-
* An Azure subscription. If you dont have one, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
37+
* An Azure subscription. If you don't have one, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
3838

3939
* An [Azure Machine Learning workspace](how-to-manage-workspace.md).
4040

articles/machine-learning/toc.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,10 +89,10 @@
8989
href: concept-workspace.md
9090
- name: Environments
9191
href: concept-environments.md
92-
- name: Data access
93-
href: concept-data.md
9492
- name: Data ingestion
95-
href: concept-data-ingestion.md
93+
href: concept-data-ingestion.md
94+
- name: Data access
95+
href: concept-data.md
9696
- name: Model training
9797
displayName: run config, estimator, machine learning pipeline, ml pipeline, train model
9898
href: concept-train-machine-learning-model.md

0 commit comments

Comments
 (0)