You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Customer intent: As an experienced Python developer, I need to securely access my data in my Azure storage solutions and use it to accomplish my machine learning tasks.
16
16
---
17
17
18
-
# Azure Machine Learning Datastores
18
+
# Azure Machine Learning datastores
19
19
20
20
Supported cloud-based storage services in Azure Machine Learning include:
21
21
@@ -34,14 +34,14 @@ Storage URIs use *identity-based* access that will prompt you for your Azure Act
34
34
> [!NOTE]
35
35
> When using Notebooks in Azure Machine Learning Studio, your Azure Active Directory token is automatically passed through to storage for data access authentication.
36
36
37
-
Whilst storage URIs provide a convenient mechanism to access data, there may be cases where using an Azure Machine Learning *Datastore* is a better option:
37
+
Although storage URIs provide a convenient mechanism to access data, there may be cases where using an Azure Machine Learning *Datastore* is a better option:
38
38
39
-
1.**You need *credential-based* data access (for example: Service Principals, SAS Tokens, Account Name/Key).** Datastores are helpful because they keep the connection information to your data storage securely in an Azure Keyvault, so you don't have to code it in your scripts.
40
-
1.**You want team members to easily discover relevant datastores.** Datastores are registered to an Azure Machine Learning workspace making them easier for your team members to find/discover them.
39
+
***You need *credential-based* data access (for example: Service Principals, SAS Tokens, Account Name/Key).** Datastores are helpful because they keep the connection information to your data storage securely in an Azure Keyvault, so you don't have to code it in your scripts.
40
+
***You want team members to easily discover relevant datastores.** Datastores are registered to an Azure Machine Learning workspace making them easier for your team members to find/discover them.
41
41
42
42
[Register and create a datastore](how-to-datastore.md) to easily connect to your storage account, and access the data in your underlying storage service.
43
43
44
-
## Credential-based vs Identity-based access
44
+
## Credential-based vs identity-based access
45
45
46
46
Azure Machine Learning Datastores support both credential-based and identity-based access. In *credential-based* access, your authentication credentials are usually kept in a datastore, which is used to ensure you have permission to access the storage service. When these credentials are registered via datastores, any user with the workspace Reader role can retrieve them. That scale of access can be a security concern for some organizations. When you use *identity-based* data access, Azure Machine Learning prompts you for your Azure Active Directory token for data access authentication instead of keeping your credentials in the datastore. That approach allows for data access management at the storage level and keeps credentials confidential.
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-administrate-data-authentication.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,29 +24,29 @@ Learn how to manage data access and how to authenticate in Azure Machine Learnin
24
24
25
25
In general, data access from studio involves the following checks:
26
26
27
-
1. Who is accessing?
27
+
* Who is accessing?
28
28
- There are multiple different types of authentication depending on the storage type. For example, account key, token, service principal, managed identity, and user identity.
29
29
- If authentication is made using a user identity, then it's important to know *which* user is trying to access storage. Learn more about [identity-based data access](how-to-identity-based-data-access.md).
30
-
2. Do they have permission?
30
+
* Do they have permission?
31
31
- Are the credentials correct? If so, does the service principal, managed identity, etc., have the necessary permissions on the storage? Permissions are granted using Azure role-based access controls (Azure RBAC).
32
32
-[Reader](../role-based-access-control/built-in-roles.md#reader) of the storage account reads metadata of the storage.
33
33
-[Storage Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader) reads data within a blob container.
34
34
-[Contributor](../role-based-access-control/built-in-roles.md#contributor) allows write access to a storage account.
35
35
- More roles may be required depending on the type of storage.
36
-
3. Where is access from?
36
+
* Where is access from?
37
37
- User: Is the client IP address in the VNet/subnet range?
38
38
- Workspace: Is the workspace public or does it have a private endpoint in a VNet/subnet?
39
39
- Storage: Does the storage allow public access, or does it restrict access through a service endpoint or a private endpoint?
40
-
4. What operation is being performed?
40
+
* What operation is being performed?
41
41
- Create, read, update, and delete (CRUD) operations on a data store/dataset are handled by Azure Machine Learning.
42
42
- Data Access calls (such as preview or schema) go to the underlying storage and need extra permissions.
43
-
5. Where is this operation being run; compute resources in your Azure subscription or resources hosted in a Microsoft subscription?
43
+
* Where is this operation being run; compute resources in your Azure subscription or resources hosted in a Microsoft subscription?
44
44
- All calls to dataset and datastore services (except the "Generate Profile" option) use resources hosted in a __Microsoft subscription__ to run the operations.
45
45
- Jobs, including the "Generate Profile" option for datasets, run on a compute resource in __your subscription__, and access the data from there. So the compute identity needs permission to the storage rather than the identity of the user submitting the job.
46
46
47
47
The following diagram shows the general flow of a data access call. In this example, a user is trying to make a data access call through a machine learning workspace, without using any compute resource.
48
48
49
-
:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data":::
49
+
:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data.":::
> Whilst the above example shows a local file. Remember that path supports cloud storage (https, abfss, wasbs protocols). Therefore, if you want to register data in a > cloud location just specify the path with any of the supported protocols.
288
+
> Although the above example shows a local file. Remember that path supports cloud storage (https, abfss, wasbs protocols). Therefore, if you want to register data in a > cloud location just specify the path with any of the supported protocols.
289
289
290
290
# [CLI](#tab/CLI)
291
291
You can also use CLI and following YAML that describes an MLTable to register MLTable Data.
@@ -367,6 +367,7 @@ command: |
367
367
"
368
368
```
369
369
370
+
> [!NOTE]
370
371
> **For local files and folders**, only relative paths are supported. To be explicit, we will **not** support absolute paths as that would require us to change the MLTable file that is residing on disk before we move it to cloud storage.
371
372
372
373
You can put MLTable file and underlying data in the *same folder* but in a cloud object store. You can specify `mltable:` in their job that points to a location on a datastore that contains the MLTable file:
@@ -530,7 +531,7 @@ Below are the supported transformations that are specific for json lines:
530
531
- `invalid_lines`How to handle lines that are invalid JSON. Supported values are `error` and `drop`. Defaults to `error`.
531
532
- `encoding`Specify the file encoding. Supported encodings are `utf8`, `iso88591`, `latin1`, `ascii`, `utf16`, `utf32`, `utf8bom` and `windows1252`. Default is `utf8`.
532
533
533
-
## Global Transforms
534
+
## Global transforms
534
535
535
536
MLTable-artifacts provide transformations specific to the delimited text, parquet, Delta. There are other transforms that mltable-artifact files support:
536
537
@@ -539,7 +540,7 @@ MLTable-artifacts provide transformations specific to the delimited text, parque
539
540
- `skip`: This skips the first *n* records of the table
540
541
- `drop_columns`: Drops the specified columns from the table. This transform supports regex so that users can drop columns matching a particular pattern.
541
542
- `keep_columns`: Keeps only the specified columns in the table. This transform supports regex so that users can keep columns matching a particular pattern.
542
-
- `filter`: Filter the data, leaving only the records that match the specified expression. **NOTE: This will come post-GA as we need to define the filter query language**.
543
+
- `filter`: Filter the data, leaving only the records that match the specified expression.
543
544
- `extract_partition_format_into_columns`: Specify the partition format of path. Defaults to None. The partition information of each path will be extracted into columns based on the specified format. Format part '{column_name}' creates string column, and '{column_name:yyyy/MM/dd/HH/mm/ss}' creates datetime column, where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extract year, month, day, hour, minute and second for the datetime type. The format should start from the position of first partition key until the end of file path. For example, given the path '../Accounts/2019/01/01/data.csv' where the partition is by department name and time, partition_format='/{Department}/{PartitionDate:yyyy/MM/dd}/data.csv' creates a string column 'Department' with the value 'Accounts' and a datetime column 'PartitionDate' with the value '2019-01-01'.
544
545
Our principle here's to support transforms *specific to data delivery* and not to get into wider feature engineering transforms.
# Customer intent: As an experienced Python developer, I need to make my data in Azure storage available to my remote compute to train my machine learning models.
17
17
---
18
18
19
-
# Connect to storage with Azure Machine Learning Datastores
19
+
# Connect to storage with Azure Machine Learning datastores
20
20
21
-
In this article, learn how to connect to data storage services on Azure with Azure Machine Learning Datastores.
21
+
In this article, learn how to connect to data storage services on Azure with Azure Machine Learning datastores.
22
22
23
23
## Prerequisites
24
24
@@ -29,15 +29,15 @@ In this article, learn how to connect to data storage services on Azure with Azu
29
29
- An Azure Machine Learning workspace.
30
30
31
31
> [!NOTE]
32
-
> Azure Machine Learning Datastores do **not** create the underlying storage accounts, rather they register an **existing** storage account for use in Azure Machine Learning. It is not a requirement to use Azure Machine Learning Datastores - you can use storage URIs directly assuming you have access to the underlying data.
32
+
> Azure Machine Learning datastores do **not** create the underlying storage accounts, rather they register an **existing** storage account for use in Azure Machine Learning. It is not a requirement to use Azure Machine Learning datastores - you can use storage URIs directly assuming you have access to the underlying data.
0 commit comments