incorporating feedback

Larry Franks · Larry Franks · commit abfa5de472ea · 2022-05-24T14:47:09.000-04:00
diff --git a/articles/machine-learning/concept-datastore.md b/articles/machine-learning/concept-datastore.md
@@ -1,5 +1,5 @@
 ---
-title: Azure Machine Learning Datastores
+title: Azure Machine Learning datastores
 titleSuffix: Azure Machine Learning
 description: Learn how to securely connect to your data storage on Azure with Azure Machine Learning datastores.
 services: machine-learning
@@ -15,7 +15,7 @@ ms.custom: devx-track-python, data4ml
 # Customer intent: As an experienced Python developer, I need to securely access my data in my Azure storage solutions and use it to accomplish my machine learning tasks.
 ---
 
-# Azure Machine Learning Datastores
+# Azure Machine Learning datastores
 
 Supported cloud-based storage services in Azure Machine Learning include:
 
@@ -34,14 +34,14 @@ Storage URIs use *identity-based* access that will prompt you for your Azure Act
 > [!NOTE]
 > When using Notebooks in Azure Machine Learning Studio, your Azure Active Directory token is automatically passed through to storage for data access authentication.
 
-Whilst storage URIs provide a convenient mechanism to access data, there may be cases where using an Azure Machine Learning *Datastore* is a better option:
+Although storage URIs provide a convenient mechanism to access data, there may be cases where using an Azure Machine Learning *Datastore* is a better option:
 
-1. **You need *credential-based* data access (for example: Service Principals, SAS Tokens, Account Name/Key).** Datastores are helpful because they keep the connection information to your data storage securely in an Azure Keyvault, so you don't have to code it in your scripts.
-1. **You want team members to easily discover relevant datastores.** Datastores are registered to an Azure Machine Learning workspace making them easier for your team members to find/discover them.
+* **You need *credential-based* data access (for example: Service Principals, SAS Tokens, Account Name/Key).** Datastores are helpful because they keep the connection information to your data storage securely in an Azure Keyvault, so you don't have to code it in your scripts.
+* **You want team members to easily discover relevant datastores.** Datastores are registered to an Azure Machine Learning workspace making them easier for your team members to find/discover them.
 
  [Register and create a datastore](how-to-datastore.md) to easily connect to your storage account, and access the data in your underlying storage service. 
 
-## Credential-based vs Identity-based access
+## Credential-based vs identity-based access
 
 Azure Machine Learning Datastores support both credential-based and identity-based access. In *credential-based* access, your authentication credentials are usually kept in a datastore, which is used to ensure you have permission to access the storage service. When these credentials are registered via datastores, any user with the workspace Reader role can retrieve them. That scale of access can be a security concern for some organizations. When you use *identity-based* data access, Azure Machine Learning prompts you for your Azure Active Directory token for data access authentication instead of keeping your credentials in the datastore. That approach allows for data access management at the storage level and keeps credentials confidential.
 
diff --git a/articles/machine-learning/how-to-administrate-data-authentication.md b/articles/machine-learning/how-to-administrate-data-authentication.md
@@ -24,29 +24,29 @@ Learn how to manage data access and how to authenticate in Azure Machine Learnin
 
 In general, data access from studio involves the following checks:
 
-1. Who is accessing?
+* Who is accessing?
     - There are multiple different types of authentication depending on the storage type. For example, account key, token, service principal, managed identity, and user identity.
     - If authentication is made using a user identity, then it's important to know *which* user is trying to access storage. Learn more about [identity-based data access](how-to-identity-based-data-access.md).
-2. Do they have permission?
+* Do they have permission?
     - Are the credentials correct? If so, does the service principal, managed identity, etc., have the necessary permissions on the storage? Permissions are granted using Azure role-based access controls (Azure RBAC).
     - [Reader](../role-based-access-control/built-in-roles.md#reader) of the storage account reads metadata of the storage.
     - [Storage Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader) reads data within a blob container.
     - [Contributor](../role-based-access-control/built-in-roles.md#contributor) allows write access to a storage account.
     - More roles may be required depending on the type of storage.
-3. Where is access from?
+* Where is access from?
     - User: Is the client IP address in the VNet/subnet range?
     - Workspace: Is the workspace public or does it have a private endpoint in a VNet/subnet?
     - Storage: Does the storage allow public access, or does it restrict access through a service endpoint or a private endpoint?
-4. What operation is being performed?
+* What operation is being performed?
     - Create, read, update, and delete (CRUD) operations on a data store/dataset are handled by Azure Machine Learning.
     - Data Access calls (such as preview or schema) go to the underlying storage and need extra permissions.
-5. Where is this operation being run; compute resources in your Azure subscription or resources hosted in a Microsoft subscription?
+* Where is this operation being run; compute resources in your Azure subscription or resources hosted in a Microsoft subscription?
     - All calls to dataset and datastore services (except the "Generate Profile" option) use resources hosted in a __Microsoft subscription__ to run the operations.
     - Jobs, including the "Generate Profile" option for datasets, run on a compute resource in __your subscription__, and access the data from there. So the compute identity needs permission to the storage rather than the identity of the user submitting the job.
 
 The following diagram shows the general flow of a data access call. In this example, a user is trying to make a data access call through a machine learning workspace, without using any compute resource.
 
-:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data":::
+:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data.":::
 
 ## Scenarios and identities
 
diff --git a/articles/machine-learning/how-to-create-register-data-assets.md b/articles/machine-learning/how-to-create-register-data-assets.md
@@ -285,7 +285,7 @@ ml_client.data.create_or_update(my_data)
 ```
 
 > [!TIP]
-> Whilst the above example shows a local file. Remember that path supports cloud storage (https, abfss, wasbs protocols). Therefore, if you want to register data in a > cloud location just specify the path with any of the supported protocols.
+> Although the above example shows a local file. Remember that path supports cloud storage (https, abfss, wasbs protocols). Therefore, if you want to register data in a > cloud location just specify the path with any of the supported protocols.
 
 # [CLI](#tab/CLI)
 You can also use CLI and following YAML that describes an MLTable to register MLTable Data.
@@ -367,6 +367,7 @@ command: |
       "
 ```
 
+> [!NOTE]
 > **For local files and folders**, only relative paths are supported. To be explicit, we will **not** support absolute paths as that would require us to change the MLTable file that is residing on disk before we move it to cloud storage.
 
 You can put MLTable file and underlying data in the *same folder* but in a cloud object store. You can specify `mltable:` in their job that points to a location on a datastore that contains the MLTable file:
@@ -530,7 +531,7 @@ Below are the supported transformations that are specific for json lines:
 - `invalid_lines` How to handle lines that are invalid JSON. Supported values are `error` and `drop`. Defaults to `error`.
 - `encoding` Specify the file encoding. Supported encodings are `utf8`, `iso88591`, `latin1`, `ascii`, `utf16`, `utf32`, `utf8bom` and `windows1252`. Default is `utf8`.
 
-## Global Transforms
+## Global transforms
 
 MLTable-artifacts provide transformations specific to the delimited text, parquet, Delta. There are other transforms that mltable-artifact files support:
 
@@ -539,7 +540,7 @@ MLTable-artifacts provide transformations specific to the delimited text, parque
 - `skip`: This skips the first *n* records of the table
 - `drop_columns`: Drops the specified columns from the table. This transform supports regex so that users can drop columns matching a particular pattern.
 - `keep_columns`: Keeps only the specified columns in the table. This transform supports regex so that users can keep columns matching a particular pattern.
-- `filter`: Filter the data, leaving only the records that match the specified expression. **NOTE: This will come post-GA as we need to define the filter query language**.
+- `filter`: Filter the data, leaving only the records that match the specified expression.
 - `extract_partition_format_into_columns`: Specify the partition format of path. Defaults to None. The partition information of each path will be extracted into columns based on the specified format. Format part '{column_name}' creates string column, and '{column_name:yyyy/MM/dd/HH/mm/ss}' creates datetime column, where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extract year, month, day, hour, minute and second for the datetime type. The format should start from the position of first partition key until the end of file path. For example, given the path '../Accounts/2019/01/01/data.csv' where the partition is by department name and time, partition_format='/{Department}/{PartitionDate:yyyy/MM/dd}/data.csv' creates a string column 'Department' with the value 'Accounts' and a datetime column 'PartitionDate' with the value '2019-01-01'.
 Our principle here's to support transforms *specific to data delivery* and not to get into wider feature engineering transforms.
 
diff --git a/articles/machine-learning/how-to-datastore.md b/articles/machine-learning/how-to-datastore.md
@@ -16,9 +16,9 @@ ms.custom: contperf-fy21q1, devx-track-python, data4ml
 # Customer intent: As an experienced Python developer, I need to make my data in Azure storage available to my remote compute to train my machine learning models.
 ---
 
-# Connect to storage with Azure Machine Learning Datastores
+# Connect to storage with Azure Machine Learning datastores
 
-In this article, learn how to connect to data storage services on Azure with Azure Machine Learning Datastores.
+In this article, learn how to connect to data storage services on Azure with Azure Machine Learning datastores.
 
 ## Prerequisites
 
@@ -29,15 +29,15 @@ In this article, learn how to connect to data storage services on Azure with Azu
 - An Azure Machine Learning workspace.
 
 > [!NOTE]
-> Azure Machine Learning Datastores do **not** create the underlying storage accounts, rather they register an **existing** storage account for use in Azure Machine Learning. It is not a requirement to use Azure Machine Learning Datastores - you can use storage URIs directly assuming you have access to the underlying data. 
+> Azure Machine Learning datastores do **not** create the underlying storage accounts, rather they register an **existing** storage account for use in Azure Machine Learning. It is not a requirement to use Azure Machine Learning datastores - you can use storage URIs directly assuming you have access to the underlying data. 
 
 
-## Create an Azure Blob Datastore
+## Create an Azure Blob datastore
 
 # [CLI: Identity-based access](#tab/cli-identity-based-access)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_blob_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
 name: my_blob_ds # add name of your datastore here
@@ -53,10 +53,10 @@ Create the Azure Machine Learning datastore in the CLI:
 az ml datastore create --file my_blob_datastore.yml
 ```
 
-# [CLI: Account Key](#tab/cli-account-key)
+# [CLI: Account key](#tab/cli-account-key)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_blob_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
 name: blob_example
@@ -77,7 +77,7 @@ az ml datastore create --file my_blob_datastore.yml
 # [CLI: SAS](#tab/cli-sas)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_blob_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
 name: blob_sas_example
@@ -95,7 +95,7 @@ Create the Azure Machine Learning datastore in the CLI:
 az ml datastore create --file my_blob_datastore.yml
 ```
 
-# [Python SDK: Identity-based Access](#tab/sdk-identity-based-access)
+# [Python SDK: Identity-based access](#tab/sdk-identity-based-access)
 
 ```python
 from azure.ai.ml.entities import AzureBlobDatastore
@@ -113,7 +113,7 @@ store = AzureBlobDatastore(
 ml_client.create_or_update(store)
 ```
 
-# [Python SDK: Account Key](#tab/sdk-account-key)
+# [Python SDK: Account key](#tab/sdk-account-key)
 
 ```python
 from azure.ai.ml.entities import AzureBlobDatastore
@@ -158,12 +158,12 @@ ml_client.create_or_update(store)
 ```
 ---
 
-## Create an Azure Data Lake Gen2 Datastore
+## Create an Azure Data Lake Gen2 datastore
 
 # [CLI: Identity-based access](#tab/cli-adls-identity-based-access)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_adls_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
 name: adls_gen2_credless_example
@@ -179,10 +179,10 @@ Create the Azure Machine Learning datastore in the CLI:
 az ml datastore create --file my_adls_datastore.yml
 ```
 
-# [CLI: Service Principal](#tab/cli-adls-sp)
+# [CLI: Service principal](#tab/cli-adls-sp)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_adls_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
 name: adls_gen2_example
@@ -220,7 +220,7 @@ store = AzureDataLakeGen2Datastore(
 ml_client.create_or_update(store)
 ```
 
-# [Python SDK: Service Principal](#tab/sdk-adls-sp)
+# [Python SDK: Service principal](#tab/sdk-adls-sp)
 
 ```python
 from azure.ai.ml.entities import AzureDataLakeGen2Datastore
@@ -248,12 +248,12 @@ ml_client.create_or_update(store)
 ```
 ---
 
-## Create an Azure Files Datastore
+## Create an Azure Files datastore
 
-# [CLI: Account Key](#tab/cli-azfiles-account-key)
+# [CLI: Account key](#tab/cli-azfiles-account-key)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_files_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
 name: file_example
@@ -274,7 +274,7 @@ az ml datastore create --file my_files_datastore.yml
 # [CLI: SAS](#tab/cli-azfiles-sas)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_files_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
 name: file_sas_example
@@ -292,7 +292,7 @@ Create the Azure Machine Learning datastore in the CLI:
 az ml datastore create --file my_files_datastore.yml
 ```
 
-# [Python SDK: Account Key](#tab/sdk-azfiles-accountkey)
+# [Python SDK: Account key](#tab/sdk-azfiles-accountkey)
 
 ```python
 from azure.ai.ml.entities import AzureFileDatastore
@@ -337,12 +337,12 @@ ml_client.create_or_update(store)
 ```
 ---
 
-## Create an Azure Data Lake Gen1 Datastore
+## Create an Azure Data Lake Gen1 datastore
 
 # [CLI: Identity-based access](#tab/cli-adlsgen1-identity-based-access)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_adls_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
 name: alds_gen1_credless_example
@@ -357,10 +357,10 @@ Create the Azure Machine Learning datastore in the CLI:
 az ml datastore create --file my_adls_datastore.yml
 ```
 
-# [CLI: Service Principal](#tab/cli-adlsgen1-sp)
+# [CLI: Service principal](#tab/cli-adlsgen1-sp)
 Create the following YAML file (updating the values):
 
-```yml
+```yaml
 # my_adls_datastore.yml
 $schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
 name: adls_gen1_example
@@ -396,7 +396,7 @@ store = AzureDataLakeGen1Datastore(
 ml_client.create_or_update(store)
 ```
 
-# [Python SDK: Service Principal](#tab/sdk-adlsgen1-sp)
+# [Python SDK: Service principal](#tab/sdk-adlsgen1-sp)
 
 ```python
 from azure.ai.ml.entities import AzureDataLakeGen1Datastore
diff --git a/articles/machine-learning/toc.yml b/articles/machine-learning/toc.yml
@@ -201,7 +201,7 @@
       items: 
         - name: Datastores
           href: concept-datastore.md
-        - name: Access Data
+        - name: Access data
           href: concept-data.md
     - name: Collect data
       items:
@@ -370,7 +370,7 @@
         - name: Register a data asset
           displayName: data, data asset
           href: how-to-create-register-data-assets.md
-        - name: Reading/Writing Data
+        - name: Reading & writing data
           displayName: read and write data
           href: how-to-read-write-data-v2.md
         - name: Administrate data authentication