MicrosoftDocs
diff --git a/‎articles/iot-operations/connect-to-cloud/concept-dataflow-enrich.md
Lines changed: 3 additions & 1 deletion b/‎articles/iot-operations/connect-to-cloud/concept-dataflow-enrich.md
Lines changed: 3 additions & 1 deletion
diff --git a/‎articles/iot-operations/connect-to-cloud/concept-schema-registry.md
Lines changed: 20 additions & 4 deletions b/‎articles/iot-operations/connect-to-cloud/concept-schema-registry.md
Lines changed: 20 additions & 4 deletions
diff --git a/‎articles/iot-operations/connect-to-cloud/howto-configure-adlsv2-endpoint.md
Lines changed: 78 additions & 64 deletions b/‎articles/iot-operations/connect-to-cloud/howto-configure-adlsv2-endpoint.md
Lines changed: 78 additions & 64 deletions
@@ -5,7 +5,7 @@ author: PatAltimore
 ms.author: patricka
 ms.subservice: azure-data-flows
 ms.topic: concept-article
-ms.date: 10/30/2024
+ms.date: 11/13/2024
 
 #CustomerIntent: As an operator, I want to understand how to create a dataflow to enrich data sent to endpoints.
 ms.service: azure-iot-operations
@@ -17,6 +17,8 @@ ms.service: azure-iot-operations
 
 You can enrich data by using the *contextualization datasets* function. When incoming records are processed, you can query these datasets based on conditions that relate to the fields of the incoming record. This capability allows for dynamic interactions. Data from these datasets can be used to supplement information in the output fields and participate in complex calculations during the mapping process.
 
+To load sample data into the state store, use the [state store CLI](https://github.com/Azure-Samples/explore-iot-operations/tree/main/tools/state-store-cli).
+
 For example, consider the following dataset with a few records, represented as JSON records:
 
 ```json
 
@@ -4,7 +4,7 @@ description: Learn how schema registry handles message schemas to work with Azur
 author: kgremban
 ms.author: kgremban
 ms.topic: conceptual
-ms.date: 10/30/2024
+ms.date: 11/14/2024
 
 #CustomerIntent: As an operator, I want to understand how I can use message schemas to filter and transform messages.
 ---
@@ -62,7 +62,7 @@ JSON:
 
 Delta:
 
-```delta
+```json
 {
   "$schema": "Delta/1.0",
   "type": "object",
@@ -87,7 +87,7 @@ Message schemas are used in all three phases of a dataflow: defining the source
 
 ### Input schema
 
-Each dataflow source can optionally specify a message schema. If a schema is defined for a dataflow source, any incoming messages that don't match the schema are dropped. 
+Each dataflow source can optionally specify a message schema. Currently, dataflows doesn't perform runtime validation on source message schemas. 
 
 Asset sources have a predefined message schema that was created by the connector for OPC UA.
 
@@ -101,10 +101,19 @@ The operations experience uses the input schema as a starting point for your dat
 
 ### Output schema
 
-Output schemas are associated with dataflow destinations are only used for dataflows that select local storage, Fabric, Azure Storage (ADLS Gen2), or Azure Data Explorer as the destination endpoint. Currently, Azure IoT Operations experience only supports Parquet output for output schemas.
+Output schemas are associated with dataflow destinations.
+
+In the operations experience portal, you can configure output schemas for the following destination endpoints that support Parquet output:
+
+* local storage
+* Fabric OneLake
+* Azure Storage (ADLS Gen2)
+* Azure Data Explorer
 
 Note: The Delta schema format is used for both Parquet and Delta output.
 
+If you use Bicep or Kubernetes, you can configure output schemas using JSON output for MQTT and Kafka destination endpoints. MQTT- and Kafka-based destinations don't support Delta format.
+
 For these dataflows, the operations experience applies any transformations to the input schema then creates a new schema in Delta format. When the dataflow custom resource (CR) is created, it includes a `schemaRef` value that points to the generated schema stored in the schema registry.
 
 To upload an output schema, see [Upload schema](#upload-schema).
@@ -131,6 +140,13 @@ The following example creates a schema called `myschema` from inline content and
 az iot ops schema create -n myschema -g myresourcegroup --registry myregistry --format delta --type message --version-content '{\"hello\": \"world\"}' --ver 14 
 ```
 
+>[!TIP]
+>If you don't know your registry name, use the `schema registry list` command to query for it. For example:
+>
+>```azurecli
+>az iot ops schema registry list -g myresourcegroup --query "[].{Name:name}" -o tsv
+>```
+
 Once the `create` command is completed, you should see a blob in your storage account container with the schema content. The name for the blob is in the format `schema-namespace/schema/version`.
 
 You can see more options with the helper command `az iot ops schema -h`.
 
@@ -21,13 +21,25 @@ To send data to Azure Data Lake Storage Gen2 in Azure IoT Operations, you can co
 ## Prerequisites
 
 - An instance of [Azure IoT Operations](../deploy-iot-ops/howto-deploy-iot-operations.md)
-- A [configured dataflow profile](howto-configure-dataflow-profile.md)
-- A [Azure Data Lake Storage Gen2 account](../../storage/blobs/create-data-lake-storage-account.md)
+- An [Azure Data Lake Storage Gen2 account](../../storage/blobs/create-data-lake-storage-account.md)
 - A pre-created storage container in the storage account
 
-## Create an Azure Data Lake Storage Gen2 dataflow endpoint
+## Assign permission to managed identity
 
-To configure a dataflow endpoint for Azure Data Lake Storage Gen2, we suggest using the managed identity of the Azure Arc-enabled Kubernetes cluster. This approach is secure and eliminates the need for secret management. Alternatively, you can authenticate with the storage account using an access token. When using an access token, you would need to create a Kubernetes secret containing the SAS token.
+To configure a dataflow endpoint for Azure Data Lake Storage Gen2, we recommend using either a user-assigned or system-assigned managed identity. This approach is secure and eliminates the need for managing credentials manually.
+
+After the Azure Data Lake Storage Gen2 is created, you need to assign a role to the Azure IoT Operations managed identity that grants permission to write to the storage account.
+
+If using system-assigned managed identity, in Azure portal, go to your Azure IoT Operations instance and select **Overview**. Copy the name of the extension listed after **Azure IoT Operations Arc extension**. For example, *azure-iot-operations-xxxx7*. Your system-assigned managed identity can be found using the same name of the Azure IoT Operations Arc extension.
+
+Then, go to the Azure Storage account > **Access control (IAM)** > **Add role assignment**.
+
+1. On the **Role** tab select an appropriate role like `Storage Blob Data Contributor`. This gives the managed identity the necessary permissions to write to the Azure Storage blob containers. To learn more, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
+1. On the **Members** tab:
+    1. If using system-assigned managed identity, for **Assign access to**, select **User, group, or service principal** option, then select **+ Select members** and search for the name of the Azure IoT Operations Arc extension. 
+    1. If using user-assigned managed identity, for **Assign access to**, select **Managed identity** option, then select **+ Select members** and search for your [user-assigned managed identity set up for cloud connections](../deploy-iot-ops/howto-enable-secure-settings.md#set-up-a-user-assigned-managed-identity-for-cloud-connections).
+
+## Create dataflow endpoint for Azure Data Lake Storage Gen2
 
 # [Portal](#tab/portal)
 
@@ -42,7 +54,7 @@ To configure a dataflow endpoint for Azure Data Lake Storage Gen2, we suggest us
     | --------------------- | ------------------------------------------------------------------------------------------------- |
     | Name                  | The name of the dataflow endpoint.                                                              |
     | Host                  | The hostname of the Azure Data Lake Storage Gen2 endpoint in the format `<account>.blob.core.windows.net`. Replace the account placeholder with the endpoint account name. |
-    | Authentication method | The method used for authentication. Choose *System assigned managed identity*, *User assigned managed identity*, or *Access token*.     |
+    | Authentication method | The method used for authentication. We recommend that you choose [*System assigned managed identity*](#system-assigned-managed-identity) or [*User assigned managed identity*](#user-assigned-managed-identity). |
     | Client ID             | The client ID of the user-assigned managed identity. Required if using *User assigned managed identity*. |
     | Tenant ID             | The tenant ID of the user-assigned managed identity. Required if using *User assigned managed identity*. |
     | Access token secret name | The name of the Kubernetes secret containing the SAS token. Required if using *Access token*. |
@@ -77,8 +89,8 @@ resource adlsGen2Endpoint 'Microsoft.IoTOperations/instances/dataflowEndpoints@2
     dataLakeStorageSettings: {
       host: host
       authentication: {
-        method: 'SystemAssignedManagedIdentity'
-        systemAssignedManagedIdentitySettings: {}
+        // See available authentication methods section for method types
+        // method: <METHOD_TYPE>
       }
     }
   }
@@ -106,8 +118,8 @@ spec:
   dataLakeStorageSettings:
     host: https://<ACCOUNT>.blob.core.windows.net
     authentication:
-      method: SystemAssignedManagedIdentity
-      systemAssignedManagedIdentitySettings: {}
+      # See available authentication methods section for method types
+      # method: <METHOD_TYPE>
 ```
 
 Then apply the manifest file to the Kubernetes cluster.
@@ -118,8 +130,6 @@ kubectl apply -f <FILE>.yaml
 
 ---
 
-If you need to override the system-assigned managed identity audience, see the [System-assigned managed identity](#system-assigned-managed-identity) section.
-
 ### Use access token authentication
 
 Follow the steps in the [access token](#access-token) section to get a SAS token for the storage account and store it in a Kubernetes secret. 
@@ -206,19 +216,17 @@ kubectl apply -f <FILE>.yaml
 
 The following authentication methods are available for Azure Data Lake Storage Gen2 endpoints.
 
-For more information about enabling secure settings by configuring an Azure Key Vault and enabling workload identities, see [Enable secure settings in Azure IoT Operations deployment](../deploy-iot-ops/howto-enable-secure-settings.md).
-
 ### System-assigned managed identity
 
-Using the system-assigned managed identity is the recommended authentication method for Azure IoT Operations. Azure IoT Operations creates the managed identity automatically and assigns it to the Azure Arc-enabled Kubernetes cluster. It eliminates the need for secret management and allows for seamless authentication.
-
-Before creating the dataflow endpoint, assign a role to the managed identity that has write permission to the storage account. For example, you can assign the *Storage Blob Data Contributor* role. To learn more about assigning roles to blobs, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
+Before you configure the dataflow endpoint, assign a role to the Azure IoT Operations managed identity that grants permission to write to the storage account:
 
 1. In Azure portal, go to your Azure IoT Operations instance and select **Overview**.
 1. Copy the name of the extension listed after **Azure IoT Operations Arc extension**. For example, *azure-iot-operations-xxxx7*.
-1. Search for the managed identity in the Azure portal by using the name of the extension. For example, search for *azure-iot-operations-xxxx7*.
-1. Assign a role to the Azure IoT Operations Arc extension managed identity that grants permission to write to the storage account, such as *Storage Blob Data Contributor*. To learn more, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
-1. Create the *DataflowEndpoint* resource and specify the managed identity authentication method. 
+1. Go to the cloud resource you need to grant permissions. For example, go to the Azure Storage account > **Access control (IAM)** > **Add role assignment**.
+1. On the **Role** tab select an appropriate role.
+1. On the **Members** tab, for **Assign access to**, select **User, group, or service principal** option, then select **+ Select members** and search for the Azure IoT Operations managed identity. For example, *azure-iot-operations-xxxx7*.
+
+Then, configure the dataflow endpoint with system-assigned managed identity settings.
 
 # [Portal](#tab/portal)
 
@@ -279,6 +287,57 @@ dataLakeStorageSettings:
 
 ---
 
+### User-assigned managed identity
+
+To use user-assigned managed identity for authentication, you must first deploy Azure IoT Operations with secure settings enabled. Then you need to [set up a user-assigned managed identity for cloud connections](../deploy-iot-ops/howto-enable-secure-settings.md#set-up-a-user-assigned-managed-identity-for-cloud-connections). To learn more, see [Enable secure settings in Azure IoT Operations deployment](../deploy-iot-ops/howto-enable-secure-settings.md).
+
+Before you configure the dataflow endpoint, assign a role to the user-assigned managed identity that grants permission to write to the storage account:
+
+1. In Azure portal, go to the cloud resource you need to grant permissions. For example, go to the Azure Storage account > **Access control (IAM)** > **Add role assignment**.
+1. On the **Role** tab select an appropriate role.
+1. On the **Members** tab, for **Assign access to**, select **Managed identity** option, then select **+ Select members** and search for your user-assigned managed identity.
+
+Then, configure the dataflow endpoint with user-assigned managed identity settings.
+
+# [Portal](#tab/portal)
+
+In the operations experience dataflow endpoint settings page, select the **Basic** tab then choose **Authentication method** > **User assigned managed identity**.
+
+Enter the user assigned managed identity client ID and tenant ID in the appropriate fields.
+
+# [Bicep](#tab/bicep)
+
+```bicep
+dataLakeStorageSettings: {
+  authentication: {
+    method: 'UserAssignedManagedIdentity'
+    userAssignedManagedIdentitySettings: {
+      cliendId: '<ID>'
+      tenantId: '<ID>'
+      // Optional, defaults to 'https://storage.azure.com/.default'
+      // scope: 'https://<SCOPE_URL>'
+    }
+  }
+}
+```
+
+# [Kubernetes (preview)](#tab/kubernetes)
+
+```yaml
+dataLakeStorageSettings:
+  authentication:
+    method: UserAssignedManagedIdentity
+    userAssignedManagedIdentitySettings:
+      clientId: <ID>
+      tenantId: <ID>
+      # Optional, defaults to 'https://storage.azure.com/.default'
+      # scope: https://<SCOPE_URL>
+```
+
+---
+
+Here, the scope is optional and defaults to `https://storage.azure.com/.default`. If you need to override the default scope, specify the `scope` setting via the Bicep or Kubernetes manifest.
+
 ### Access token
 
 Using an access token is an alternative authentication method. This method requires you to create a Kubernetes secret with the SAS token and reference the secret in the *DataflowEndpoint* resource.
@@ -347,51 +406,6 @@ dataLakeStorageSettings:
 
 ---
 
-### User-assigned managed identity
-
-To use user-managed identity for authentication, you must first deploy Azure IoT Operations with secure settings enabled. To learn more, see [Enable secure settings in Azure IoT Operations deployment](../deploy-iot-ops/howto-enable-secure-settings.md).
-
-Then, specify the user-assigned managed identity authentication method along with the client ID, tenant ID, and scope of the managed identity.
-
-# [Portal](#tab/portal)
-
-In the operations experience dataflow endpoint settings page, select the **Basic** tab then choose **Authentication method** > **User assigned managed identity**.
-
-Enter the user assigned managed identity client ID and tenant ID in the appropriate fields.
-
-# [Bicep](#tab/bicep)
-
-```bicep
-dataLakeStorageSettings: {
-  authentication: {
-    method: 'UserAssignedManagedIdentity'
-    userAssignedManagedIdentitySettings: {
-      cliendId: '<ID>'
-      tenantId: '<ID>'
-      // Optional, defaults to 'https://storage.azure.com/.default'
-      // scope: 'https://<SCOPE_URL>'
-    }
-  }
-}
-```
-
-# [Kubernetes (preview)](#tab/kubernetes)
-
-```yaml
-dataLakeStorageSettings:
-  authentication:
-    method: UserAssignedManagedIdentity
-    userAssignedManagedIdentitySettings:
-      clientId: <ID>
-      tenantId: <ID>
-      # Optional, defaults to 'https://storage.azure.com/.default'
-      # scope: https://<SCOPE_URL>
-```
-
----
-
-Here, the scope is optional and defaults to `https://storage.azure.com/.default`. If you need to override the default scope, specify the `scope` setting via the Bicep or Kubernetes manifest.
-
 ## Advanced settings
 
 You can set advanced settings for the Azure Data Lake Storage Gen2 endpoint, such as the batching latency and message count.