Skip to content

Commit f5f327a

Browse files
authored
Merge pull request #281770 from fbsolo-ms1/main
Freshness update for concept-data.md . . .
2 parents e42bea5 + f726c1d commit f5f327a

File tree

1 file changed

+21
-21
lines changed

1 file changed

+21
-21
lines changed

articles/machine-learning/concept-data.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: conceptual
99
author: fbsolo-ms1
1010
ms.author: franksolomon
1111
ms.reviewer: swatig
12-
ms.date: 07/13/2023
12+
ms.date: 07/24/2024
1313
ms.custom: data4ml
1414
#Customer intent: As an experienced Python developer, I need secure access to my data in my Azure storage solutions, and I need to use that data to accomplish my machine learning tasks.
1515
---
@@ -24,30 +24,30 @@ An Azure Machine Learning datastore serves as a *reference* to an *existing* Azu
2424

2525
- A common, easy-to-use API that interacts with different storage types (Blob/Files/ADLS).
2626
- Easier discovery of useful datastores in team operations.
27-
- For credential-based access (service principal/SAS/key), Azure Machine Learning datastore secures connection information. This way, you won't need to place that information in your scripts.
27+
- For credential-based access (service principal/SAS/key), an Azure Machine Learning datastore secures connection information. This way, you don't need to place that information in your scripts.
2828

29-
When you create a datastore with an existing Azure storage account, you can choose between two different authentication methods:
29+
When you create a datastore with an existing Azure storage account, you have two different authentication method options:
3030

3131
- **Credential-based** - authenticate data access with a service principal, shared access signature (SAS) token, or account key. Users with *Reader* workspace access can access the credentials.
3232
- **Identity-based** - use your Microsoft Entra identity or managed identity to authenticate data access.
3333

34-
The following table summarizes the Azure cloud-based storage services that an Azure Machine Learning datastore can create. Additionally, the table summarizes the authentication types that can access those services:
34+
This table summarizes the Azure cloud-based storage services that an Azure Machine Learning datastore can create. Additionally, the table summarizes the authentication types that can access those services:
3535

3636
Supported storage service | Credential-based authentication | Identity-based authentication
3737
|---|:----:|:---:|
38-
Azure Blob Container| ✓ | ✓|
38+
Azure Blob Container| ✓ | ✓ |
3939
Azure File Share| ✓ | |
40-
Azure Data Lake Gen1 | ✓ | ✓|
41-
Azure Data Lake Gen2| ✓ | ✓|
40+
Azure Data Lake Gen1 | ✓ | ✓ |
41+
Azure Data Lake Gen2| ✓ | ✓ |
4242

43-
See [Create datastores](how-to-datastore.md) for more information about datastores.
43+
For more information about datastores, visit [Create datastores](how-to-datastore.md).
4444

4545
### Default datastores
4646

47-
Each Azure Machine Learning workspace has a default storage account (Azure storage account) that contains the following datastores:
47+
Each Azure Machine Learning workspace has a default storage account (Azure storage account) that contains these datastores:
4848

4949
> [!TIP]
50-
> To find the ID for your workspace, go to the workspace in the [Azure portal](https://portal.azure.com/). Expand **Settings** and then select **Properties**. The **Workspace ID** is displayed.
50+
> To find the ID for your workspace, go to the workspace in the [Azure portal](https://portal.azure.com/). Expand **Settings**, and then select **Properties**. The **Workspace ID** appears.
5151
5252
| Datastore name | Data storage type | Data storage name | Description |
5353
|---|---|---|---|
@@ -58,13 +58,13 @@ Each Azure Machine Learning workspace has a default storage account (Azure stora
5858

5959
## Data types
6060

61-
A URI (storage location) can reference a file, a folder, or a data table. A machine learning job input and output definition requires one of the following three data types:
61+
A URI (storage location) can reference a file, a folder, or a data table. A machine learning job input and output definition requires one of these three data types:
6262

6363
|Type |V2 API |V1 API |Canonical Scenarios | V2/V1 API Difference
6464
|---------|---------|---------|---------|---------|
6565
|**File**<br>Reference a single file | `uri_file` | `FileDataset` | Read/write a single file - the file can have any format. | A type new to V2 APIs. In V1 APIs, files always mapped to a folder on the compute target filesystem; this mapping required an `os.path.join`. In V2 APIs, the single file is mapped. This way, you can refer to that location in your code. |
66-
|**Folder**<br> Reference a single folder | `uri_folder` | `FileDataset` | You must read/write a folder of parquet/CSV files into Pandas/Spark.<br><br>Deep-learning with images, text, audio, video files located in a folder. | In V1 APIs, `FileDataset` had an associated engine that could take a file sample from a folder. In V2 APIs, a Folder is a simple mapping to the compute target filesystem. |
67-
|**Table**<br> Reference a data table | `mltable` | `TabularDataset` | You have a complex schema subject to frequent changes, or you need a subset of large tabular data.<br><br>AutoML with Tables. | In V1 APIs, the Azure Machine Learning back-end stored the data materialization blueprint. As a result, `TabularDataset` only worked if you had an Azure Machine Learning workspace. `mltable` stores the data materialization blueprint in *your* storage. This storage location means you can use it *disconnected to AzureML* - for example, locally and on-premises. In V2 APIs, you'll find it easier to transition from local to remote jobs. See [Working with tables in Azure Machine Learning](how-to-mltable.md) for more information. |
66+
|**Folder**<br> Reference a single folder | `uri_folder` | `FileDataset` | You must read/write a folder of parquet/CSV files into Pandas/Spark.<br><br>Deep-learning with images, text, audio, video files located in a folder. | In V1 APIs, `FileDataset` had an associated engine that could take a file sample from a folder. In V2 APIs, a folder is a simple mapping to the compute target filesystem. |
67+
|**Table**<br> Reference a data table | `mltable` | `TabularDataset` | You have a complex schema subject to frequent changes, or you need a subset of large tabular data.<br><br>AutoML with Tables. | In V1 APIs, the Azure Machine Learning back-end stored the data materialization blueprint. As a result, `TabularDataset` only worked if you had an Azure Machine Learning workspace. `mltable` stores the data materialization blueprint in *your* storage. This storage location means you can use it *disconnected to Azure Machine Learning* - for example, locally and on-premises. In V2 APIs, it's easier to transition from local to remote jobs. For more information, visit [Working with tables in Azure Machine Learning](how-to-mltable.md). |
6868

6969
## URI
7070
A Uniform Resource Identifier (URI) represents a storage location on your local computer, Azure storage, or a publicly available http(s) location. These examples show URIs for different storage options:
@@ -78,23 +78,23 @@ A Uniform Resource Identifier (URI) represents a storage location on your local
7878
|Azure Data Lake (gen2) | `abfss://<file_system>@<account_name>.dfs.core.windows.net/<folder>/<file>.csv` |
7979
| Azure Data Lake (gen1) | `adl://<accountname>.azuredatalakestore.net/<folder1>/<folder2>` |
8080

81-
An Azure Machine Learning job maps URIs to the compute target filesystem. This mapping means that in a command that consumes or produces a URI, that URI works like a file or a folder. A URI uses **identity-based authentication** to connect to storage services, with either your Microsoft Entra ID (default), or Managed Identity. Azure Machine Learning [Datastore](#datastore) URIs can apply either identity-based authentication, or **credential-based** (for example, Service Principal, SAS token, account key), without exposure of secrets.
81+
An Azure Machine Learning job maps URIs to the compute target filesystem. This mapping means that for a command that consumes or produces a URI, that URI works like a file or a folder. A URI uses **identity-based authentication** to connect to storage services, with either your Microsoft Entra ID (default) or Managed Identity. Azure Machine Learning [Datastore](#datastore) URIs can apply either identity-based authentication, or **credential-based** (for example, Service Principal, SAS token, account key) authentication, without exposure of secrets.
8282

8383
A URI can serve as either *input* or an *output* to an Azure Machine Learning job, and it can map to the compute target filesystem with one of four different *mode* options:
8484

85-
- **Read-*only* mount (`ro_mount`)**: The URI represents a storage location that is *mounted* to the compute target filesystem. The mounted data location supports read-only output exclusively.
85+
- **Read-*only* mount (`ro_mount`)**: The URI represents a storage location that is *mounted* to the compute target filesystem. The mounted data location exclusively supports read-only output.
8686
- **Read-*write* mount (`rw_mount`)**: The URI represents a storage location that is *mounted* to the compute target filesystem. The mounted data location supports both read output from it *and* data writes to it.
8787
- **Download (`download`)**: The URI represents a storage location containing data that is *downloaded* to the compute target filesystem.
8888
- **Upload (`upload`)**: All data written to a compute target location is *uploaded* to the storage location represented by the URI.
8989

9090
Additionally, you can pass in the URI as a job input string with the **direct** mode. This table summarizes the combination of modes available for inputs and outputs:
9191

92-
Job<br>Input or Output | `upload` | `download` | `ro_mount` | `rw_mount` | `direct` |
93-
------ | :---: | :---: | :---: | :---: | :---: |
94-
Input | | ✓ | ✓ | | ✓ |
95-
Output | ✓ | | | ✓ |
92+
Job<br>Input or Output | `upload` | `download` | `ro_mount` | `rw_mount` | `direct` |
93+
------ | :---: | :---: | :---: | :---: | :---: |
94+
Input | | ✓ | ✓ | | ✓ |
95+
Output | ✓ | | | ✓ |
9696

97-
See [Access data in a job](how-to-read-write-data-v2.md) for more information.
97+
For more information, visit [Access data in a job](how-to-read-write-data-v2.md).
9898

9999
## Data runtime capability
100100
Azure Machine Learning uses its own *data runtime* for one of three purposes:
@@ -118,7 +118,7 @@ An Azure Machine Learning data asset resembles web browser bookmarks (favorites)
118118

119119
Data asset creation also creates a *reference* to the data source location, along with a copy of its metadata. Because the data remains in its existing location, you incur no extra storage cost, and you don't risk data source integrity. You can create Data assets from Azure Machine Learning datastores, Azure Storage, public URLs, or local files.
120120

121-
See [Create data assets](how-to-create-data-assets.md) for more information about data assets.
121+
For more information about data assets, visit [Create data assets](how-to-create-data-assets.md).
122122

123123
## Next steps
124124

0 commit comments

Comments
 (0)