You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/iot-operations/connect-to-cloud/concept-dataflow-mapping.md
+41-40Lines changed: 41 additions & 40 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: PatAltimore
5
5
ms.author: patricka
6
6
ms.subservice: azure-data-flows
7
7
ms.topic: concept-article
8
-
ms.date: 08/03/2024
8
+
ms.date: 09/24/2024
9
9
ai-usage: ai-assisted
10
10
11
11
#CustomerIntent: As an operator, I want to understand how to use the dataflow mapping language to transform data.
@@ -58,7 +58,7 @@ The transformations are achieved through *mapping*, which typically involves:
58
58
59
59
***Input definition**: Identifying the fields in the input records that are used.
60
60
***Output definition**: Specifying where and how the input fields are organized in the output records.
61
-
***Conversion (optional)**: Modifying the input fields to fit into the output fields. Conversion is required when multiple input fields are combined into a single output field.
61
+
***Conversion (optional)**: Modifying the input fields to fit into the output fields. `expression` is required when multiple input fields are combined into a single output field.
62
62
63
63
The following mapping is an example:
64
64
@@ -187,10 +187,10 @@ In the previous example, the path consists of three segments: `Payload`, `Tag.10
187
187
188
188
```yaml
189
189
- inputs:
190
-
- 'Payload.He said: "No. It's done"'
190
+
- 'Payload.He said: "No. It is done"'
191
191
```
192
192
193
-
In this case, the path is split into the segments `Payload`, `He said: "No`, and `It's done"` (starting with a space).
193
+
In this case, the path is split into the segments `Payload`, `He said: "No`, and `It is done"` (starting with a space).
194
194
195
195
### Segmentation algorithm
196
196
@@ -205,8 +205,8 @@ Let's consider a basic scenario to understand the use of asterisks in mappings:
205
205
206
206
```yaml
207
207
- inputs:
208
-
- *
209
-
output: *
208
+
- '*'
209
+
output: '*'
210
210
```
211
211
212
212
Here's how the asterisk (`*`) operates in this context:
@@ -243,12 +243,12 @@ Mapping configuration that uses wildcards:
243
243
244
244
```yaml
245
245
- inputs:
246
-
- ColorProperties.*
247
-
output: *
246
+
- 'ColorProperties.*'
247
+
output: '*'
248
248
249
249
- inputs:
250
-
- TextureProperties.*
251
-
output: *
250
+
- 'TextureProperties.*'
251
+
output: '*'
252
252
```
253
253
254
254
Resulting JSON:
@@ -276,6 +276,7 @@ When you place a wildcard, you must follow these rules:
276
276
* **At the beginning:** `*.path2.path3` - Here, the asterisk matches any segment that leads up to `path2.path3`.
277
277
* **In the middle:** `path1.*.path3` - In this configuration, the asterisk matches any segment between `path1` and `path3`.
278
278
* **At the end:** `path1.path2.*` - The asterisk at the end matches any segment that follows after `path1.path2`.
279
+
* The path containing the asterisk must be enclosed in single quotation marks (`'`).
279
280
280
281
### Multi-input wildcards
281
282
@@ -302,10 +303,10 @@ Mapping configuration that uses wildcards:
302
303
303
304
```yaml
304
305
- inputs:
305
-
- *.Max # - $1
306
-
- *.Min # - $2
307
-
output: ColorProperties.*
308
-
conversion: ($1 + $2) / 2
306
+
- '*.Max' # - $1
307
+
- '*.Min' # - $2
308
+
output: 'ColorProperties.*'
309
+
expression: ($1 + $2) / 2
309
310
```
310
311
311
312
Resulting JSON:
@@ -359,11 +360,11 @@ Initial mapping configuration that uses wildcards:
@@ -398,15 +399,15 @@ When you use the previous example from multi-input wildcards, consider the follo
398
399
399
400
```yaml
400
401
- inputs:
401
-
- *.Max # - $1
402
-
- *.Min # - $2
403
-
output: ColorProperties.*.Avg
402
+
- '*.Max' # - $1
403
+
- '*.Min' # - $2
404
+
output: 'ColorProperties.*.Avg'
404
405
expression: ($1 + $2) / 2
405
406
406
407
- inputs:
407
-
- *.Max # - $1
408
-
- *.Min # - $2
409
-
output: ColorProperties.*.Diff
408
+
- '*.Max' # - $1
409
+
- '*.Min' # - $2
410
+
output: 'ColorProperties.*.Diff'
410
411
expression: abs($1 - $2)
411
412
```
412
413
@@ -437,9 +438,9 @@ Now, consider a scenario where a specific field needs a different calculation:
437
438
438
439
```yaml
439
440
- inputs:
440
-
- *.Max # - $1
441
-
- *.Min # - $2
442
-
output: ColorProperties.*
441
+
- '*.Max' # - $1
442
+
- '*.Min' # - $2
443
+
output: 'ColorProperties.*'
443
444
expression: ($1 + $2) / 2
444
445
445
446
- inputs:
@@ -458,9 +459,9 @@ Consider a special case for the same fields to help decide the right action:
458
459
459
460
```yaml
460
461
- inputs:
461
-
- *.Max # - $1
462
-
- *.Min # - $2
463
-
output: ColorProperties.*
462
+
- '*.Max' # - $1
463
+
- '*.Min' # - $2
464
+
output: 'ColorProperties.*'
464
465
expression: ($1 + $2) / 2
465
466
466
467
- inputs:
@@ -505,8 +506,8 @@ This mapping copies `BaseSalary` from the context dataset directly into the `Emp
505
506
506
507
```yaml
507
508
- inputs:
508
-
- $context(position).*
509
-
output: Employment.*
509
+
- '$context(position).*'
510
+
output: 'Employment.*'
510
511
```
511
512
512
513
This configuration allows for a dynamic mapping where every field within the `position` dataset is copied into the `Employment` section of the output record:
@@ -523,13 +524,13 @@ This configuration allows for a dynamic mapping where every field within the `po
523
524
524
525
## Last known value
525
526
526
-
You can track the last known value of a property. Suffix the input field with `?last` to capture the last known value of the field. When a property is missing a value in a subsequent input payload, the last known value is mapped to the output payload.
527
+
You can track the last known value of a property. Suffix the input field with `? $last` to capture the last known value of the field. When a property is missing a value in a subsequent input payload, the last known value is mapped to the output payload.
Copy file name to clipboardExpand all lines: articles/iot-operations/connect-to-cloud/howto-configure-adlsv2-endpoint.md
+22-17Lines changed: 22 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,14 +5,16 @@ author: PatAltimore
5
5
ms.author: patricka
6
6
ms.subservice: azure-data-flows
7
7
ms.topic: how-to
8
-
ms.date: 08/27/2024
8
+
ms.date: 10/02/2024
9
9
ai-usage: ai-assisted
10
10
11
11
#CustomerIntent: As an operator, I want to understand how to configure dataflow endpoints for Azure Data Lake Storage Gen2 in Azure IoT Operations so that I can send data to Azure Data Lake Storage Gen2.
12
12
---
13
13
14
14
# Configure dataflow endpoints for Azure Data Lake Storage Gen2
To send data to Azure Data Lake Storage Gen2 in Azure IoT Operations Preview, you can configure a dataflow endpoint. This configuration allows you to specify the destination endpoint, authentication method, table, and other settings.
17
19
18
20
## Prerequisites
@@ -47,7 +49,7 @@ To configure a dataflow endpoint for Azure Data Lake Storage Gen2, we suggest us
47
49
systemAssignedManagedIdentitySettings: {}
48
50
```
49
51
50
-
If you need to override the system-assigned managed identity audience, see the [system-assigned managed identity](#system-assigned-managed-identity) section.
52
+
If you need to override the system-assigned managed identity audience, see the [System-assigned managed identity](#system-assigned-managed-identity) section.
51
53
52
54
### Use access token authentication
53
55
@@ -111,7 +113,7 @@ The following authentication methods are available for Azure Data Lake Storage G
111
113
112
114
Using the system-assigned managed identity is the recommended authentication method for Azure IoT Operations. Azure IoT Operations creates the managed identity automatically and assigns it to the Azure Arc-enabled Kubernetes cluster. It eliminates the need for secret management and allows for seamless authentication with the Azure Data Lake Storage Gen2 account.
113
115
114
-
Before creating the dataflow endpoint, you need to assign a role to the managed identity that has write permission to the storage account. For example, you can assign the *Storage Blob Data Contributor* role. To learn more about assigning roles to blobs, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
116
+
Before creating the dataflow endpoint, assign a role to the managed identity that has write permission to the storage account. For example, you can assign the *Storage Blob Data Contributor* role. To learn more about assigning roles to blobs, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
115
117
116
118
In the *DataflowEndpoint* resource, specify the managed identity authentication method. In most cases, you don't need to specify other settings. Not specifying an audience creates a managed identity with the default audience scoped to your storage account.
117
119
@@ -136,25 +138,25 @@ datalakeStorageSettings:
136
138
137
139
Using an access token is an alternative authentication method. This method requires you to create a Kubernetes secret with the SAS token and reference the secret in the *DataflowEndpoint* resource.
138
140
139
-
1. Get a [SAS token](../../storage/common/storage-sas-overview.md) for an Azure Data Lake Storage Gen2 (ADLSv2) account. For example, use the Azure portal to browse to your storage account. On the left menu, choose **Security + networking** > **Shared access signature**. Use the following table to set the required permissions.
141
+
Get a [SAS token](../../storage/common/storage-sas-overview.md) for an Azure Data Lake Storage Gen2 (ADLSv2) account. For example, use the Azure portal to browse to your storage account. On the left menu, choose **Security + networking** > **Shared access signature**. Use the following table to set the required permissions.
1. To enhance security and follow the principle of least privilege, you can generate a SAS token for a specific container. To prevent authentication errors, ensure that the container specified in the SAS token matches the dataflow destination setting in the configuration.
149
+
To enhance security and follow the principle of least privilege, you can generate a SAS token for a specific container. To prevent authentication errors, ensure that the container specified in the SAS token matches the dataflow destination setting in the configuration.
148
150
149
-
1. Create a Kubernetes secret with the SAS token. Don't include the question mark `?` that might be at the beginning of the token.
151
+
Create a Kubernetes secret with the SAS token. Don't include the question mark `?` that might be at the beginning of the token.
Finally, create the DataflowEndpoint resource with the secret reference.
159
+
Create the *DataflowEndpoint* resource with the secret reference.
158
160
159
161
```yaml
160
162
datalakeStorageSettings:
@@ -168,17 +170,18 @@ datalakeStorageSettings:
168
170
169
171
To use a user-assigned managed identity, specify the `UserAssignedManagedIdentity` authentication method and provide the `clientId` and `tenantId` of the managed identity.
170
172
171
-
172
173
```yaml
173
174
datalakeStorageSettings:
174
175
authentication:
175
176
method: UserAssignedManagedIdentity
176
177
userAssignedManagedIdentitySettings:
177
-
clientId: <id>
178
-
tenantId: <id>
178
+
clientId: <ID>
179
+
tenantId: <ID>
179
180
```
180
181
181
-
### Batching
182
+
## Advanced settings
183
+
184
+
You can set advanced settings for the Azure Data Lake Storage Gen2 endpoint, such as the batching latency and message count.
182
185
183
186
Use the `batching` settings to configure the maximum number of messages and the maximum latency before the messages are sent to the destination. This setting is useful when you want to optimize for network bandwidth and reduce the number of requests to the destination.
184
187
@@ -189,9 +192,11 @@ Use the `batching` settings to configure the maximum number of messages and the
189
192
190
193
For example, to configure the maximum number of messages to 1000 and the maximum latency to 100 seconds, use the following settings:
191
194
195
+
Set the values in the dataflow endpoint custom resource.
0 commit comments