You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can enrich data by using the *contextualization datasets* function. When incoming records are processed, you can query these datasets based on conditions that relate to the fields of the incoming record. This capability allows for dynamic interactions. Data from these datasets can be used to supplement information in the output fields and participate in complex calculations during the mapping process.
19
19
20
+
To load sample data into the state store, use the [state store CLI](https://github.com/Azure-Samples/explore-iot-operations/tree/main/tools/state-store-cli).
21
+
20
22
For example, consider the following dataset with a few records, represented as JSON records:
Copy file name to clipboardExpand all lines: articles/iot-operations/connect-to-cloud/concept-schema-registry.md
+20-4Lines changed: 20 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Learn how schema registry handles message schemas to work with Azur
4
4
author: kgremban
5
5
ms.author: kgremban
6
6
ms.topic: conceptual
7
-
ms.date: 10/30/2024
7
+
ms.date: 11/14/2024
8
8
9
9
#CustomerIntent: As an operator, I want to understand how I can use message schemas to filter and transform messages.
10
10
---
@@ -62,7 +62,7 @@ JSON:
62
62
63
63
Delta:
64
64
65
-
```delta
65
+
```json
66
66
{
67
67
"$schema": "Delta/1.0",
68
68
"type": "object",
@@ -87,7 +87,7 @@ Message schemas are used in all three phases of a dataflow: defining the source
87
87
88
88
### Input schema
89
89
90
-
Each dataflow source can optionally specify a message schema. If a schema is defined for a dataflow source, any incoming messages that don't match the schema are dropped.
90
+
Each dataflow source can optionally specify a message schema. Currently, dataflows doesn't perform runtime validation on source message schemas.
91
91
92
92
Asset sources have a predefined message schema that was created by the connector for OPC UA.
93
93
@@ -101,10 +101,19 @@ The operations experience uses the input schema as a starting point for your dat
101
101
102
102
### Output schema
103
103
104
-
Output schemas are associated with dataflow destinations are only used for dataflows that select local storage, Fabric, Azure Storage (ADLS Gen2), or Azure Data Explorer as the destination endpoint. Currently, Azure IoT Operations experience only supports Parquet output for output schemas.
104
+
Output schemas are associated with dataflow destinations.
105
+
106
+
In the operations experience portal, you can configure output schemas for the following destination endpoints that support Parquet output:
107
+
108
+
* local storage
109
+
* Fabric OneLake
110
+
* Azure Storage (ADLS Gen2)
111
+
* Azure Data Explorer
105
112
106
113
Note: The Delta schema format is used for both Parquet and Delta output.
107
114
115
+
If you use Bicep or Kubernetes, you can configure output schemas using JSON output for MQTT and Kafka destination endpoints. MQTT- and Kafka-based destinations don't support Delta format.
116
+
108
117
For these dataflows, the operations experience applies any transformations to the input schema then creates a new schema in Delta format. When the dataflow custom resource (CR) is created, it includes a `schemaRef` value that points to the generated schema stored in the schema registry.
109
118
110
119
To upload an output schema, see [Upload schema](#upload-schema).
@@ -131,6 +140,13 @@ The following example creates a schema called `myschema` from inline content and
Once the `create` command is completed, you should see a blob in your storage account container with the schema content. The name for the blob is in the format `schema-namespace/schema/version`.
135
151
136
152
You can see more options with the helper command `az iot ops schema -h`.
Copy file name to clipboardExpand all lines: articles/iot-operations/connect-to-cloud/howto-configure-adlsv2-endpoint.md
+78-64Lines changed: 78 additions & 64 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,13 +21,25 @@ To send data to Azure Data Lake Storage Gen2 in Azure IoT Operations, you can co
21
21
## Prerequisites
22
22
23
23
- An instance of [Azure IoT Operations](../deploy-iot-ops/howto-deploy-iot-operations.md)
24
-
- A [configured dataflow profile](howto-configure-dataflow-profile.md)
25
-
- A [Azure Data Lake Storage Gen2 account](../../storage/blobs/create-data-lake-storage-account.md)
24
+
- An [Azure Data Lake Storage Gen2 account](../../storage/blobs/create-data-lake-storage-account.md)
26
25
- A pre-created storage container in the storage account
27
26
28
-
## Create an Azure Data Lake Storage Gen2 dataflow endpoint
27
+
## Assign permission to managed identity
29
28
30
-
To configure a dataflow endpoint for Azure Data Lake Storage Gen2, we suggest using the managed identity of the Azure Arc-enabled Kubernetes cluster. This approach is secure and eliminates the need for secret management. Alternatively, you can authenticate with the storage account using an access token. When using an access token, you would need to create a Kubernetes secret containing the SAS token.
29
+
To configure a dataflow endpoint for Azure Data Lake Storage Gen2, we recommend using either a user-assigned or system-assigned managed identity. This approach is secure and eliminates the need for managing credentials manually.
30
+
31
+
After the Azure Data Lake Storage Gen2 is created, you need to assign a role to the Azure IoT Operations managed identity that grants permission to write to the storage account.
32
+
33
+
If using system-assigned managed identity, in Azure portal, go to your Azure IoT Operations instance and select **Overview**. Copy the name of the extension listed after **Azure IoT Operations Arc extension**. For example, *azure-iot-operations-xxxx7*. Your system-assigned managed identity can be found using the same name of the Azure IoT Operations Arc extension.
34
+
35
+
Then, go to the Azure Storage account > **Access control (IAM)** > **Add role assignment**.
36
+
37
+
1. On the **Role** tab select an appropriate role like `Storage Blob Data Contributor`. This gives the managed identity the necessary permissions to write to the Azure Storage blob containers. To learn more, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
38
+
1. On the **Members** tab:
39
+
1. If using system-assigned managed identity, for **Assign access to**, select **User, group, or service principal** option, then select **+ Select members** and search for the name of the Azure IoT Operations Arc extension.
40
+
1. If using user-assigned managed identity, for **Assign access to**, select **Managed identity** option, then select **+ Select members** and search for your [user-assigned managed identity set up for cloud connections](../deploy-iot-ops/howto-enable-secure-settings.md#set-up-a-user-assigned-managed-identity-for-cloud-connections).
41
+
42
+
## Create dataflow endpoint for Azure Data Lake Storage Gen2
31
43
32
44
# [Portal](#tab/portal)
33
45
@@ -42,7 +54,7 @@ To configure a dataflow endpoint for Azure Data Lake Storage Gen2, we suggest us
| Host | The hostname of the Azure Data Lake Storage Gen2 endpoint in the format `<account>.blob.core.windows.net`. Replace the account placeholder with the endpoint account name. |
45
-
| Authentication method | The method used for authentication. Choose *System assigned managed identity*, *User assigned managed identity*, or *Access token*. |
57
+
| Authentication method | The method used for authentication. We recommend that you choose [*System assigned managed identity*](#system-assigned-managed-identity) or [*User assigned managed identity*](#user-assigned-managed-identity).|
46
58
| Client ID | The client ID of the user-assigned managed identity. Required if using *User assigned managed identity*. |
47
59
| Tenant ID | The tenant ID of the user-assigned managed identity. Required if using *User assigned managed identity*. |
48
60
| Access token secret name | The name of the Kubernetes secret containing the SAS token. Required if using *Access token*. |
// See available authentication methods section for method types
93
+
// method: <METHOD_TYPE>
82
94
}
83
95
}
84
96
}
@@ -106,8 +118,8 @@ spec:
106
118
dataLakeStorageSettings:
107
119
host: https://<ACCOUNT>.blob.core.windows.net
108
120
authentication:
109
-
method: SystemAssignedManagedIdentity
110
-
systemAssignedManagedIdentitySettings: {}
121
+
# See available authentication methods section for method types
122
+
# method: <METHOD_TYPE>
111
123
```
112
124
113
125
Then apply the manifest file to the Kubernetes cluster.
@@ -118,8 +130,6 @@ kubectl apply -f <FILE>.yaml
118
130
119
131
---
120
132
121
-
If you need to override the system-assigned managed identity audience, see the [System-assigned managed identity](#system-assigned-managed-identity) section.
122
-
123
133
### Use access token authentication
124
134
125
135
Follow the steps in the [access token](#access-token) section to get a SAS token for the storage account and store it in a Kubernetes secret.
The following authentication methods are available for Azure Data Lake Storage Gen2 endpoints.
208
218
209
-
For more information about enabling secure settings by configuring an Azure Key Vault and enabling workload identities, see [Enable secure settings in Azure IoT Operations deployment](../deploy-iot-ops/howto-enable-secure-settings.md).
210
-
211
219
### System-assigned managed identity
212
220
213
-
Using the system-assigned managed identity is the recommended authentication method for Azure IoT Operations. Azure IoT Operations creates the managed identity automatically and assigns it to the Azure Arc-enabled Kubernetes cluster. It eliminates the need for secret management and allows for seamless authentication.
214
-
215
-
Before creating the dataflow endpoint, assign a role to the managed identity that has write permission to the storage account. For example, you can assign the *Storage Blob Data Contributor* role. To learn more about assigning roles to blobs, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
221
+
Before you configure the dataflow endpoint, assign a role to the Azure IoT Operations managed identity that grants permission to write to the storage account:
216
222
217
223
1. In Azure portal, go to your Azure IoT Operations instance and select **Overview**.
218
224
1. Copy the name of the extension listed after **Azure IoT Operations Arc extension**. For example, *azure-iot-operations-xxxx7*.
219
-
1. Search for the managed identity in the Azure portal by using the name of the extension. For example, search for *azure-iot-operations-xxxx7*.
220
-
1. Assign a role to the Azure IoT Operations Arc extension managed identity that grants permission to write to the storage account, such as *Storage Blob Data Contributor*. To learn more, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
221
-
1. Create the *DataflowEndpoint* resource and specify the managed identity authentication method.
225
+
1. Go to the cloud resource you need to grant permissions. For example, go to the Azure Storage account > **Access control (IAM)** > **Add role assignment**.
226
+
1. On the **Role** tab select an appropriate role.
227
+
1. On the **Members** tab, for **Assign access to**, select **User, group, or service principal** option, then select **+ Select members** and search for the Azure IoT Operations managed identity. For example, *azure-iot-operations-xxxx7*.
228
+
229
+
Then, configure the dataflow endpoint with system-assigned managed identity settings.
222
230
223
231
# [Portal](#tab/portal)
224
232
@@ -279,6 +287,57 @@ dataLakeStorageSettings:
279
287
280
288
---
281
289
290
+
### User-assigned managed identity
291
+
292
+
To use user-assigned managed identity for authentication, you must first deploy Azure IoT Operations with secure settings enabled. Then you need to [set up a user-assigned managed identity for cloud connections](../deploy-iot-ops/howto-enable-secure-settings.md#set-up-a-user-assigned-managed-identity-for-cloud-connections). To learn more, see [Enable secure settings in Azure IoT Operations deployment](../deploy-iot-ops/howto-enable-secure-settings.md).
293
+
294
+
Before you configure the dataflow endpoint, assign a role to the user-assigned managed identity that grants permission to write to the storage account:
295
+
296
+
1. In Azure portal, go to the cloud resource you need to grant permissions. For example, go to the Azure Storage account > **Access control (IAM)** > **Add role assignment**.
297
+
1. On the **Role** tab select an appropriate role.
298
+
1. On the **Members** tab, for **Assign access to**, select **Managed identity** option, then select **+ Select members** and search for your user-assigned managed identity.
299
+
300
+
Then, configure the dataflow endpoint with user-assigned managed identity settings.
301
+
302
+
# [Portal](#tab/portal)
303
+
304
+
In the operations experience dataflow endpoint settings page, select the **Basic** tab then choose **Authentication method** > **User assigned managed identity**.
305
+
306
+
Enter the user assigned managed identity client ID and tenant ID in the appropriate fields.
307
+
308
+
# [Bicep](#tab/bicep)
309
+
310
+
```bicep
311
+
dataLakeStorageSettings: {
312
+
authentication: {
313
+
method: 'UserAssignedManagedIdentity'
314
+
userAssignedManagedIdentitySettings: {
315
+
cliendId: '<ID>'
316
+
tenantId: '<ID>'
317
+
// Optional, defaults to 'https://storage.azure.com/.default'
318
+
// scope: 'https://<SCOPE_URL>'
319
+
}
320
+
}
321
+
}
322
+
```
323
+
324
+
# [Kubernetes (preview)](#tab/kubernetes)
325
+
326
+
```yaml
327
+
dataLakeStorageSettings:
328
+
authentication:
329
+
method: UserAssignedManagedIdentity
330
+
userAssignedManagedIdentitySettings:
331
+
clientId: <ID>
332
+
tenantId: <ID>
333
+
# Optional, defaults to 'https://storage.azure.com/.default'
334
+
# scope: https://<SCOPE_URL>
335
+
```
336
+
337
+
---
338
+
339
+
Here, the scope is optional and defaults to `https://storage.azure.com/.default`. If you need to override the default scope, specify the `scope` setting via the Bicep or Kubernetes manifest.
340
+
282
341
### Access token
283
342
284
343
Using an access token is an alternative authentication method. This method requires you to create a Kubernetes secret with the SAS token and reference the secret in the *DataflowEndpoint* resource.
@@ -347,51 +406,6 @@ dataLakeStorageSettings:
347
406
348
407
---
349
408
350
-
### User-assigned managed identity
351
-
352
-
To use user-managed identity for authentication, you must first deploy Azure IoT Operations with secure settings enabled. To learn more, see [Enable secure settings in Azure IoT Operations deployment](../deploy-iot-ops/howto-enable-secure-settings.md).
353
-
354
-
Then, specify the user-assigned managed identity authentication method along with the client ID, tenant ID, and scope of the managed identity.
355
-
356
-
# [Portal](#tab/portal)
357
-
358
-
In the operations experience dataflow endpoint settings page, select the **Basic** tab then choose **Authentication method** > **User assigned managed identity**.
359
-
360
-
Enter the user assigned managed identity client ID and tenant ID in the appropriate fields.
361
-
362
-
# [Bicep](#tab/bicep)
363
-
364
-
```bicep
365
-
dataLakeStorageSettings: {
366
-
authentication: {
367
-
method: 'UserAssignedManagedIdentity'
368
-
userAssignedManagedIdentitySettings: {
369
-
cliendId: '<ID>'
370
-
tenantId: '<ID>'
371
-
// Optional, defaults to 'https://storage.azure.com/.default'
372
-
// scope: 'https://<SCOPE_URL>'
373
-
}
374
-
}
375
-
}
376
-
```
377
-
378
-
# [Kubernetes (preview)](#tab/kubernetes)
379
-
380
-
```yaml
381
-
dataLakeStorageSettings:
382
-
authentication:
383
-
method: UserAssignedManagedIdentity
384
-
userAssignedManagedIdentitySettings:
385
-
clientId: <ID>
386
-
tenantId: <ID>
387
-
# Optional, defaults to 'https://storage.azure.com/.default'
388
-
# scope: https://<SCOPE_URL>
389
-
```
390
-
391
-
---
392
-
393
-
Here, the scope is optional and defaults to `https://storage.azure.com/.default`. If you need to override the default scope, specify the `scope` setting via the Bicep or Kubernetes manifest.
394
-
395
409
## Advanced settings
396
410
397
411
You can set advanced settings for the Azure Data Lake Storage Gen2 endpoint, such as the batching latency and message count.
0 commit comments