Skip to content

Commit abfa5de

Browse files
author
Larry Franks
committed
incorporating feedback
1 parent 834c575 commit abfa5de

File tree

5 files changed

+43
-42
lines changed

5 files changed

+43
-42
lines changed

articles/machine-learning/concept-datastore.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Azure Machine Learning Datastores
2+
title: Azure Machine Learning datastores
33
titleSuffix: Azure Machine Learning
44
description: Learn how to securely connect to your data storage on Azure with Azure Machine Learning datastores.
55
services: machine-learning
@@ -15,7 +15,7 @@ ms.custom: devx-track-python, data4ml
1515
# Customer intent: As an experienced Python developer, I need to securely access my data in my Azure storage solutions and use it to accomplish my machine learning tasks.
1616
---
1717

18-
# Azure Machine Learning Datastores
18+
# Azure Machine Learning datastores
1919

2020
Supported cloud-based storage services in Azure Machine Learning include:
2121

@@ -34,14 +34,14 @@ Storage URIs use *identity-based* access that will prompt you for your Azure Act
3434
> [!NOTE]
3535
> When using Notebooks in Azure Machine Learning Studio, your Azure Active Directory token is automatically passed through to storage for data access authentication.
3636
37-
Whilst storage URIs provide a convenient mechanism to access data, there may be cases where using an Azure Machine Learning *Datastore* is a better option:
37+
Although storage URIs provide a convenient mechanism to access data, there may be cases where using an Azure Machine Learning *Datastore* is a better option:
3838

39-
1. **You need *credential-based* data access (for example: Service Principals, SAS Tokens, Account Name/Key).** Datastores are helpful because they keep the connection information to your data storage securely in an Azure Keyvault, so you don't have to code it in your scripts.
40-
1. **You want team members to easily discover relevant datastores.** Datastores are registered to an Azure Machine Learning workspace making them easier for your team members to find/discover them.
39+
* **You need *credential-based* data access (for example: Service Principals, SAS Tokens, Account Name/Key).** Datastores are helpful because they keep the connection information to your data storage securely in an Azure Keyvault, so you don't have to code it in your scripts.
40+
* **You want team members to easily discover relevant datastores.** Datastores are registered to an Azure Machine Learning workspace making them easier for your team members to find/discover them.
4141

4242
[Register and create a datastore](how-to-datastore.md) to easily connect to your storage account, and access the data in your underlying storage service.
4343

44-
## Credential-based vs Identity-based access
44+
## Credential-based vs identity-based access
4545

4646
Azure Machine Learning Datastores support both credential-based and identity-based access. In *credential-based* access, your authentication credentials are usually kept in a datastore, which is used to ensure you have permission to access the storage service. When these credentials are registered via datastores, any user with the workspace Reader role can retrieve them. That scale of access can be a security concern for some organizations. When you use *identity-based* data access, Azure Machine Learning prompts you for your Azure Active Directory token for data access authentication instead of keeping your credentials in the datastore. That approach allows for data access management at the storage level and keeps credentials confidential.
4747

articles/machine-learning/how-to-administrate-data-authentication.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,29 +24,29 @@ Learn how to manage data access and how to authenticate in Azure Machine Learnin
2424
2525
In general, data access from studio involves the following checks:
2626

27-
1. Who is accessing?
27+
* Who is accessing?
2828
- There are multiple different types of authentication depending on the storage type. For example, account key, token, service principal, managed identity, and user identity.
2929
- If authentication is made using a user identity, then it's important to know *which* user is trying to access storage. Learn more about [identity-based data access](how-to-identity-based-data-access.md).
30-
2. Do they have permission?
30+
* Do they have permission?
3131
- Are the credentials correct? If so, does the service principal, managed identity, etc., have the necessary permissions on the storage? Permissions are granted using Azure role-based access controls (Azure RBAC).
3232
- [Reader](../role-based-access-control/built-in-roles.md#reader) of the storage account reads metadata of the storage.
3333
- [Storage Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader) reads data within a blob container.
3434
- [Contributor](../role-based-access-control/built-in-roles.md#contributor) allows write access to a storage account.
3535
- More roles may be required depending on the type of storage.
36-
3. Where is access from?
36+
* Where is access from?
3737
- User: Is the client IP address in the VNet/subnet range?
3838
- Workspace: Is the workspace public or does it have a private endpoint in a VNet/subnet?
3939
- Storage: Does the storage allow public access, or does it restrict access through a service endpoint or a private endpoint?
40-
4. What operation is being performed?
40+
* What operation is being performed?
4141
- Create, read, update, and delete (CRUD) operations on a data store/dataset are handled by Azure Machine Learning.
4242
- Data Access calls (such as preview or schema) go to the underlying storage and need extra permissions.
43-
5. Where is this operation being run; compute resources in your Azure subscription or resources hosted in a Microsoft subscription?
43+
* Where is this operation being run; compute resources in your Azure subscription or resources hosted in a Microsoft subscription?
4444
- All calls to dataset and datastore services (except the "Generate Profile" option) use resources hosted in a __Microsoft subscription__ to run the operations.
4545
- Jobs, including the "Generate Profile" option for datasets, run on a compute resource in __your subscription__, and access the data from there. So the compute identity needs permission to the storage rather than the identity of the user submitting the job.
4646

4747
The following diagram shows the general flow of a data access call. In this example, a user is trying to make a data access call through a machine learning workspace, without using any compute resource.
4848

49-
:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data":::
49+
:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data.":::
5050

5151
## Scenarios and identities
5252

articles/machine-learning/how-to-create-register-data-assets.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ ml_client.data.create_or_update(my_data)
285285
```
286286

287287
> [!TIP]
288-
> Whilst the above example shows a local file. Remember that path supports cloud storage (https, abfss, wasbs protocols). Therefore, if you want to register data in a > cloud location just specify the path with any of the supported protocols.
288+
> Although the above example shows a local file. Remember that path supports cloud storage (https, abfss, wasbs protocols). Therefore, if you want to register data in a > cloud location just specify the path with any of the supported protocols.
289289

290290
# [CLI](#tab/CLI)
291291
You can also use CLI and following YAML that describes an MLTable to register MLTable Data.
@@ -367,6 +367,7 @@ command: |
367367
"
368368
```
369369

370+
> [!NOTE]
370371
> **For local files and folders**, only relative paths are supported. To be explicit, we will **not** support absolute paths as that would require us to change the MLTable file that is residing on disk before we move it to cloud storage.
371372

372373
You can put MLTable file and underlying data in the *same folder* but in a cloud object store. You can specify `mltable:` in their job that points to a location on a datastore that contains the MLTable file:
@@ -530,7 +531,7 @@ Below are the supported transformations that are specific for json lines:
530531
- `invalid_lines` How to handle lines that are invalid JSON. Supported values are `error` and `drop`. Defaults to `error`.
531532
- `encoding` Specify the file encoding. Supported encodings are `utf8`, `iso88591`, `latin1`, `ascii`, `utf16`, `utf32`, `utf8bom` and `windows1252`. Default is `utf8`.
532533

533-
## Global Transforms
534+
## Global transforms
534535

535536
MLTable-artifacts provide transformations specific to the delimited text, parquet, Delta. There are other transforms that mltable-artifact files support:
536537

@@ -539,7 +540,7 @@ MLTable-artifacts provide transformations specific to the delimited text, parque
539540
- `skip`: This skips the first *n* records of the table
540541
- `drop_columns`: Drops the specified columns from the table. This transform supports regex so that users can drop columns matching a particular pattern.
541542
- `keep_columns`: Keeps only the specified columns in the table. This transform supports regex so that users can keep columns matching a particular pattern.
542-
- `filter`: Filter the data, leaving only the records that match the specified expression. **NOTE: This will come post-GA as we need to define the filter query language**.
543+
- `filter`: Filter the data, leaving only the records that match the specified expression.
543544
- `extract_partition_format_into_columns`: Specify the partition format of path. Defaults to None. The partition information of each path will be extracted into columns based on the specified format. Format part '{column_name}' creates string column, and '{column_name:yyyy/MM/dd/HH/mm/ss}' creates datetime column, where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extract year, month, day, hour, minute and second for the datetime type. The format should start from the position of first partition key until the end of file path. For example, given the path '../Accounts/2019/01/01/data.csv' where the partition is by department name and time, partition_format='/{Department}/{PartitionDate:yyyy/MM/dd}/data.csv' creates a string column 'Department' with the value 'Accounts' and a datetime column 'PartitionDate' with the value '2019-01-01'.
544545
Our principle here's to support transforms *specific to data delivery* and not to get into wider feature engineering transforms.
545546

articles/machine-learning/how-to-datastore.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ ms.custom: contperf-fy21q1, devx-track-python, data4ml
1616
# Customer intent: As an experienced Python developer, I need to make my data in Azure storage available to my remote compute to train my machine learning models.
1717
---
1818

19-
# Connect to storage with Azure Machine Learning Datastores
19+
# Connect to storage with Azure Machine Learning datastores
2020

21-
In this article, learn how to connect to data storage services on Azure with Azure Machine Learning Datastores.
21+
In this article, learn how to connect to data storage services on Azure with Azure Machine Learning datastores.
2222

2323
## Prerequisites
2424

@@ -29,15 +29,15 @@ In this article, learn how to connect to data storage services on Azure with Azu
2929
- An Azure Machine Learning workspace.
3030

3131
> [!NOTE]
32-
> Azure Machine Learning Datastores do **not** create the underlying storage accounts, rather they register an **existing** storage account for use in Azure Machine Learning. It is not a requirement to use Azure Machine Learning Datastores - you can use storage URIs directly assuming you have access to the underlying data.
32+
> Azure Machine Learning datastores do **not** create the underlying storage accounts, rather they register an **existing** storage account for use in Azure Machine Learning. It is not a requirement to use Azure Machine Learning datastores - you can use storage URIs directly assuming you have access to the underlying data.
3333
3434

35-
## Create an Azure Blob Datastore
35+
## Create an Azure Blob datastore
3636

3737
# [CLI: Identity-based access](#tab/cli-identity-based-access)
3838
Create the following YAML file (updating the values):
3939

40-
```yml
40+
```yaml
4141
# my_blob_datastore.yml
4242
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
4343
name: my_blob_ds # add name of your datastore here
@@ -53,10 +53,10 @@ Create the Azure Machine Learning datastore in the CLI:
5353
az ml datastore create --file my_blob_datastore.yml
5454
```
5555

56-
# [CLI: Account Key](#tab/cli-account-key)
56+
# [CLI: Account key](#tab/cli-account-key)
5757
Create the following YAML file (updating the values):
5858

59-
```yml
59+
```yaml
6060
# my_blob_datastore.yml
6161
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
6262
name: blob_example
@@ -77,7 +77,7 @@ az ml datastore create --file my_blob_datastore.yml
7777
# [CLI: SAS](#tab/cli-sas)
7878
Create the following YAML file (updating the values):
7979

80-
```yml
80+
```yaml
8181
# my_blob_datastore.yml
8282
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
8383
name: blob_sas_example
@@ -95,7 +95,7 @@ Create the Azure Machine Learning datastore in the CLI:
9595
az ml datastore create --file my_blob_datastore.yml
9696
```
9797

98-
# [Python SDK: Identity-based Access](#tab/sdk-identity-based-access)
98+
# [Python SDK: Identity-based access](#tab/sdk-identity-based-access)
9999

100100
```python
101101
from azure.ai.ml.entities import AzureBlobDatastore
@@ -113,7 +113,7 @@ store = AzureBlobDatastore(
113113
ml_client.create_or_update(store)
114114
```
115115

116-
# [Python SDK: Account Key](#tab/sdk-account-key)
116+
# [Python SDK: Account key](#tab/sdk-account-key)
117117

118118
```python
119119
from azure.ai.ml.entities import AzureBlobDatastore
@@ -158,12 +158,12 @@ ml_client.create_or_update(store)
158158
```
159159
---
160160

161-
## Create an Azure Data Lake Gen2 Datastore
161+
## Create an Azure Data Lake Gen2 datastore
162162

163163
# [CLI: Identity-based access](#tab/cli-adls-identity-based-access)
164164
Create the following YAML file (updating the values):
165165

166-
```yml
166+
```yaml
167167
# my_adls_datastore.yml
168168
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
169169
name: adls_gen2_credless_example
@@ -179,10 +179,10 @@ Create the Azure Machine Learning datastore in the CLI:
179179
az ml datastore create --file my_adls_datastore.yml
180180
```
181181

182-
# [CLI: Service Principal](#tab/cli-adls-sp)
182+
# [CLI: Service principal](#tab/cli-adls-sp)
183183
Create the following YAML file (updating the values):
184184

185-
```yml
185+
```yaml
186186
# my_adls_datastore.yml
187187
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
188188
name: adls_gen2_example
@@ -220,7 +220,7 @@ store = AzureDataLakeGen2Datastore(
220220
ml_client.create_or_update(store)
221221
```
222222

223-
# [Python SDK: Service Principal](#tab/sdk-adls-sp)
223+
# [Python SDK: Service principal](#tab/sdk-adls-sp)
224224

225225
```python
226226
from azure.ai.ml.entities import AzureDataLakeGen2Datastore
@@ -248,12 +248,12 @@ ml_client.create_or_update(store)
248248
```
249249
---
250250

251-
## Create an Azure Files Datastore
251+
## Create an Azure Files datastore
252252

253-
# [CLI: Account Key](#tab/cli-azfiles-account-key)
253+
# [CLI: Account key](#tab/cli-azfiles-account-key)
254254
Create the following YAML file (updating the values):
255255

256-
```yml
256+
```yaml
257257
# my_files_datastore.yml
258258
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
259259
name: file_example
@@ -274,7 +274,7 @@ az ml datastore create --file my_files_datastore.yml
274274
# [CLI: SAS](#tab/cli-azfiles-sas)
275275
Create the following YAML file (updating the values):
276276

277-
```yml
277+
```yaml
278278
# my_files_datastore.yml
279279
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
280280
name: file_sas_example
@@ -292,7 +292,7 @@ Create the Azure Machine Learning datastore in the CLI:
292292
az ml datastore create --file my_files_datastore.yml
293293
```
294294

295-
# [Python SDK: Account Key](#tab/sdk-azfiles-accountkey)
295+
# [Python SDK: Account key](#tab/sdk-azfiles-accountkey)
296296

297297
```python
298298
from azure.ai.ml.entities import AzureFileDatastore
@@ -337,12 +337,12 @@ ml_client.create_or_update(store)
337337
```
338338
---
339339

340-
## Create an Azure Data Lake Gen1 Datastore
340+
## Create an Azure Data Lake Gen1 datastore
341341

342342
# [CLI: Identity-based access](#tab/cli-adlsgen1-identity-based-access)
343343
Create the following YAML file (updating the values):
344344

345-
```yml
345+
```yaml
346346
# my_adls_datastore.yml
347347
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
348348
name: alds_gen1_credless_example
@@ -357,10 +357,10 @@ Create the Azure Machine Learning datastore in the CLI:
357357
az ml datastore create --file my_adls_datastore.yml
358358
```
359359

360-
# [CLI: Service Principal](#tab/cli-adlsgen1-sp)
360+
# [CLI: Service principal](#tab/cli-adlsgen1-sp)
361361
Create the following YAML file (updating the values):
362362

363-
```yml
363+
```yaml
364364
# my_adls_datastore.yml
365365
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
366366
name: adls_gen1_example
@@ -396,7 +396,7 @@ store = AzureDataLakeGen1Datastore(
396396
ml_client.create_or_update(store)
397397
```
398398

399-
# [Python SDK: Service Principal](#tab/sdk-adlsgen1-sp)
399+
# [Python SDK: Service principal](#tab/sdk-adlsgen1-sp)
400400

401401
```python
402402
from azure.ai.ml.entities import AzureDataLakeGen1Datastore

articles/machine-learning/toc.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@
201201
items:
202202
- name: Datastores
203203
href: concept-datastore.md
204-
- name: Access Data
204+
- name: Access data
205205
href: concept-data.md
206206
- name: Collect data
207207
items:
@@ -370,7 +370,7 @@
370370
- name: Register a data asset
371371
displayName: data, data asset
372372
href: how-to-create-register-data-assets.md
373-
- name: Reading/Writing Data
373+
- name: Reading & writing data
374374
displayName: read and write data
375375
href: how-to-read-write-data-v2.md
376376
- name: Administrate data authentication

0 commit comments

Comments
 (0)