You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-data.md
+21-21Lines changed: 21 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: conceptual
9
9
author: fbsolo-ms1
10
10
ms.author: franksolomon
11
11
ms.reviewer: swatig
12
-
ms.date: 07/13/2023
12
+
ms.date: 07/24/2024
13
13
ms.custom: data4ml
14
14
#Customer intent: As an experienced Python developer, I need secure access to my data in my Azure storage solutions, and I need to use that data to accomplish my machine learning tasks.
15
15
---
@@ -24,30 +24,30 @@ An Azure Machine Learning datastore serves as a *reference* to an *existing* Azu
24
24
25
25
- A common, easy-to-use API that interacts with different storage types (Blob/Files/ADLS).
26
26
- Easier discovery of useful datastores in team operations.
27
-
- For credential-based access (service principal/SAS/key), Azure Machine Learning datastore secures connection information. This way, you won't need to place that information in your scripts.
27
+
- For credential-based access (service principal/SAS/key), an Azure Machine Learning datastore secures connection information. This way, you don't need to place that information in your scripts.
28
28
29
-
When you create a datastore with an existing Azure storage account, you can choose between two different authentication methods:
29
+
When you create a datastore with an existing Azure storage account, you have two different authentication method options:
30
30
31
31
-**Credential-based** - authenticate data access with a service principal, shared access signature (SAS) token, or account key. Users with *Reader* workspace access can access the credentials.
32
32
-**Identity-based** - use your Microsoft Entra identity or managed identity to authenticate data access.
33
33
34
-
The following table summarizes the Azure cloud-based storage services that an Azure Machine Learning datastore can create. Additionally, the table summarizes the authentication types that can access those services:
34
+
This table summarizes the Azure cloud-based storage services that an Azure Machine Learning datastore can create. Additionally, the table summarizes the authentication types that can access those services:
35
35
36
36
Supported storage service | Credential-based authentication | Identity-based authentication
37
37
|---|:----:|:---:|
38
-
Azure Blob Container| ✓ | ✓|
38
+
Azure Blob Container| ✓ | ✓|
39
39
Azure File Share| ✓ | |
40
-
Azure Data Lake Gen1 | ✓ | ✓|
41
-
Azure Data Lake Gen2| ✓ | ✓|
40
+
Azure Data Lake Gen1 | ✓ | ✓|
41
+
Azure Data Lake Gen2| ✓ | ✓|
42
42
43
-
See [Create datastores](how-to-datastore.md) for more information about datastores.
43
+
For more information about datastores, visit [Create datastores](how-to-datastore.md).
44
44
45
45
### Default datastores
46
46
47
-
Each Azure Machine Learning workspace has a default storage account (Azure storage account) that contains the following datastores:
47
+
Each Azure Machine Learning workspace has a default storage account (Azure storage account) that contains these datastores:
48
48
49
49
> [!TIP]
50
-
> To find the ID for your workspace, go to the workspace in the [Azure portal](https://portal.azure.com/). Expand **Settings** and then select **Properties**. The **Workspace ID**is displayed.
50
+
> To find the ID for your workspace, go to the workspace in the [Azure portal](https://portal.azure.com/). Expand **Settings**, and then select **Properties**. The **Workspace ID**appears.
51
51
52
52
| Datastore name | Data storage type | Data storage name | Description |
53
53
|---|---|---|---|
@@ -58,13 +58,13 @@ Each Azure Machine Learning workspace has a default storage account (Azure stora
58
58
59
59
## Data types
60
60
61
-
A URI (storage location) can reference a file, a folder, or a data table. A machine learning job input and output definition requires one of the following three data types:
61
+
A URI (storage location) can reference a file, a folder, or a data table. A machine learning job input and output definition requires one of these three data types:
62
62
63
63
|Type |V2 API |V1 API |Canonical Scenarios | V2/V1 API Difference
|**File**<br>Reference a single file |`uri_file`|`FileDataset`| Read/write a single file - the file can have any format. | A type new to V2 APIs. In V1 APIs, files always mapped to a folder on the compute target filesystem; this mapping required an `os.path.join`. In V2 APIs, the single file is mapped. This way, you can refer to that location in your code. |
66
-
|**Folder**<br> Reference a single folder |`uri_folder`|`FileDataset`| You must read/write a folder of parquet/CSV files into Pandas/Spark.<br><br>Deep-learning with images, text, audio, video files located in a folder. | In V1 APIs, `FileDataset` had an associated engine that could take a file sample from a folder. In V2 APIs, a Folder is a simple mapping to the compute target filesystem. |
67
-
|**Table**<br> Reference a data table |`mltable`|`TabularDataset`| You have a complex schema subject to frequent changes, or you need a subset of large tabular data.<br><br>AutoML with Tables. | In V1 APIs, the Azure Machine Learning back-end stored the data materialization blueprint. As a result, `TabularDataset` only worked if you had an Azure Machine Learning workspace. `mltable` stores the data materialization blueprint in *your* storage. This storage location means you can use it *disconnected to AzureML* - for example, locally and on-premises. In V2 APIs, you'll find it easier to transition from local to remote jobs. See [Working with tables in Azure Machine Learning](how-to-mltable.md) for more information. |
66
+
|**Folder**<br> Reference a single folder |`uri_folder`|`FileDataset`| You must read/write a folder of parquet/CSV files into Pandas/Spark.<br><br>Deep-learning with images, text, audio, video files located in a folder. | In V1 APIs, `FileDataset` had an associated engine that could take a file sample from a folder. In V2 APIs, a folder is a simple mapping to the compute target filesystem. |
67
+
|**Table**<br> Reference a data table |`mltable`|`TabularDataset`| You have a complex schema subject to frequent changes, or you need a subset of large tabular data.<br><br>AutoML with Tables. | In V1 APIs, the Azure Machine Learning back-end stored the data materialization blueprint. As a result, `TabularDataset` only worked if you had an Azure Machine Learning workspace. `mltable` stores the data materialization blueprint in *your* storage. This storage location means you can use it *disconnected to Azure Machine Learning* - for example, locally and on-premises. In V2 APIs, it's easier to transition from local to remote jobs. For more information, visit [Working with tables in Azure Machine Learning](how-to-mltable.md). |
68
68
69
69
## URI
70
70
A Uniform Resource Identifier (URI) represents a storage location on your local computer, Azure storage, or a publicly available http(s) location. These examples show URIs for different storage options:
@@ -78,23 +78,23 @@ A Uniform Resource Identifier (URI) represents a storage location on your local
78
78
|Azure Data Lake (gen2) |`abfss://<file_system>@<account_name>.dfs.core.windows.net/<folder>/<file>.csv`|
79
79
| Azure Data Lake (gen1) |`adl://<accountname>.azuredatalakestore.net/<folder1>/<folder2>`|
80
80
81
-
An Azure Machine Learning job maps URIs to the compute target filesystem. This mapping means that in a command that consumes or produces a URI, that URI works like a file or a folder. A URI uses **identity-based authentication** to connect to storage services, with either your Microsoft Entra ID (default), or Managed Identity. Azure Machine Learning [Datastore](#datastore) URIs can apply either identity-based authentication, or **credential-based** (for example, Service Principal, SAS token, account key), without exposure of secrets.
81
+
An Azure Machine Learning job maps URIs to the compute target filesystem. This mapping means that for a command that consumes or produces a URI, that URI works like a file or a folder. A URI uses **identity-based authentication** to connect to storage services, with either your Microsoft Entra ID (default) or Managed Identity. Azure Machine Learning [Datastore](#datastore) URIs can apply either identity-based authentication, or **credential-based** (for example, Service Principal, SAS token, account key) authentication, without exposure of secrets.
82
82
83
83
A URI can serve as either *input* or an *output* to an Azure Machine Learning job, and it can map to the compute target filesystem with one of four different *mode* options:
84
84
85
-
-**Read-*only* mount (`ro_mount`)**: The URI represents a storage location that is *mounted* to the compute target filesystem. The mounted data location supports read-only output exclusively.
85
+
-**Read-*only* mount (`ro_mount`)**: The URI represents a storage location that is *mounted* to the compute target filesystem. The mounted data location exclusively supports read-only output.
86
86
-**Read-*write* mount (`rw_mount`)**: The URI represents a storage location that is *mounted* to the compute target filesystem. The mounted data location supports both read output from it *and* data writes to it.
87
87
-**Download (`download`)**: The URI represents a storage location containing data that is *downloaded* to the compute target filesystem.
88
88
-**Upload (`upload`)**: All data written to a compute target location is *uploaded* to the storage location represented by the URI.
89
89
90
90
Additionally, you can pass in the URI as a job input string with the **direct** mode. This table summarizes the combination of modes available for inputs and outputs:
See [Access data in a job](how-to-read-write-data-v2.md) for more information.
97
+
For more information, visit [Access data in a job](how-to-read-write-data-v2.md).
98
98
99
99
## Data runtime capability
100
100
Azure Machine Learning uses its own *data runtime* for one of three purposes:
@@ -118,7 +118,7 @@ An Azure Machine Learning data asset resembles web browser bookmarks (favorites)
118
118
119
119
Data asset creation also creates a *reference* to the data source location, along with a copy of its metadata. Because the data remains in its existing location, you incur no extra storage cost, and you don't risk data source integrity. You can create Data assets from Azure Machine Learning datastores, Azure Storage, public URLs, or local files.
120
120
121
-
See [Create data assets](how-to-create-data-assets.md) for more information about data assets.
121
+
For more information about data assets, visit [Create data assets](how-to-create-data-assets.md).
0 commit comments