Skip to content

Commit fd32dfd

Browse files
author
Jill Grant
authored
Merge pull request #286498 from PatAltimore/patricka-dataflow-mq-release-aio-m2
AIO dataflow and MQTT changes M2
2 parents 2f4ef21 + 3582e46 commit fd32dfd

36 files changed

+1478
-370
lines changed

articles/iot-operations/.openpublishing.redirection.iot-operations.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@
182182
},
183183
{
184184
"source_path_from_root": "/articles/iot-operations/manage-mqtt-connectivity/howto-manage-secrets.md",
185-
"redirect_url": "/azure/iot-operations/manage-mqtt-broker/overview-iot-mq",
185+
"redirect_url": "/azure/iot-operations/deploy-iot-ops/howto-manage-secrets",
186186
"redirect_document_id": false
187187
},
188188
{

articles/iot-operations/connect-to-cloud/concept-dataflow-mapping.md

Lines changed: 41 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: PatAltimore
55
ms.author: patricka
66
ms.subservice: azure-data-flows
77
ms.topic: concept-article
8-
ms.date: 08/03/2024
8+
ms.date: 09/24/2024
99
ai-usage: ai-assisted
1010

1111
#CustomerIntent: As an operator, I want to understand how to use the dataflow mapping language to transform data.
@@ -58,7 +58,7 @@ The transformations are achieved through *mapping*, which typically involves:
5858

5959
* **Input definition**: Identifying the fields in the input records that are used.
6060
* **Output definition**: Specifying where and how the input fields are organized in the output records.
61-
* **Conversion (optional)**: Modifying the input fields to fit into the output fields. Conversion is required when multiple input fields are combined into a single output field.
61+
* **Conversion (optional)**: Modifying the input fields to fit into the output fields. `expression` is required when multiple input fields are combined into a single output field.
6262

6363
The following mapping is an example:
6464

@@ -187,10 +187,10 @@ In the previous example, the path consists of three segments: `Payload`, `Tag.10
187187

188188
```yaml
189189
- inputs:
190-
- 'Payload.He said: "No. It's done"'
190+
- 'Payload.He said: "No. It is done"'
191191
```
192192

193-
In this case, the path is split into the segments `Payload`, `He said: "No`, and `It's done"` (starting with a space).
193+
In this case, the path is split into the segments `Payload`, `He said: "No`, and `It is done"` (starting with a space).
194194
195195
### Segmentation algorithm
196196
@@ -205,8 +205,8 @@ Let's consider a basic scenario to understand the use of asterisks in mappings:
205205
206206
```yaml
207207
- inputs:
208-
- *
209-
output: *
208+
- '*'
209+
output: '*'
210210
```
211211
212212
Here's how the asterisk (`*`) operates in this context:
@@ -243,12 +243,12 @@ Mapping configuration that uses wildcards:
243243

244244
```yaml
245245
- inputs:
246-
- ColorProperties.*
247-
output: *
246+
- 'ColorProperties.*'
247+
output: '*'
248248
249249
- inputs:
250-
- TextureProperties.*
251-
output: *
250+
- 'TextureProperties.*'
251+
output: '*'
252252
```
253253

254254
Resulting JSON:
@@ -276,6 +276,7 @@ When you place a wildcard, you must follow these rules:
276276
* **At the beginning:** `*.path2.path3` - Here, the asterisk matches any segment that leads up to `path2.path3`.
277277
* **In the middle:** `path1.*.path3` - In this configuration, the asterisk matches any segment between `path1` and `path3`.
278278
* **At the end:** `path1.path2.*` - The asterisk at the end matches any segment that follows after `path1.path2`.
279+
* The path containing the asterisk must be enclosed in single quotation marks (`'`).
279280

280281
### Multi-input wildcards
281282

@@ -302,10 +303,10 @@ Mapping configuration that uses wildcards:
302303

303304
```yaml
304305
- inputs:
305-
- *.Max # - $1
306-
- *.Min # - $2
307-
output: ColorProperties.*
308-
conversion: ($1 + $2) / 2
306+
- '*.Max' # - $1
307+
- '*.Min' # - $2
308+
output: 'ColorProperties.*'
309+
expression: ($1 + $2) / 2
309310
```
310311

311312
Resulting JSON:
@@ -359,11 +360,11 @@ Initial mapping configuration that uses wildcards:
359360

360361
```yaml
361362
- inputs:
362-
- *.Max # - $1
363-
- *.Min # - $2
364-
- *.Avg # - $3
365-
- *.Mean # - $4
366-
output: ColorProperties.*
363+
- '*.Max' # - $1
364+
- '*.Min' # - $2
365+
- '*.Avg' # - $3
366+
- '*.Mean' # - $4
367+
output: 'ColorProperties.*'
367368
expression: ($1, $2, $3, $4)
368369
```
369370

@@ -382,11 +383,11 @@ Corrected mapping configuration:
382383

383384
```yaml
384385
- inputs:
385-
- *.Max # - $1
386-
- *.Min # - $2
387-
- *.Mid.Avg # - $3
388-
- *.Mid.Mean # - $4
389-
output: ColorProperties.*
386+
- '*.Max' # - $1
387+
- '*.Min' # - $2
388+
- '*.Mid.Avg' # - $3
389+
- '*.Mid.Mean' # - $4
390+
output: 'ColorProperties.*'
390391
expression: ($1, $2, $3, $4)
391392
```
392393

@@ -398,15 +399,15 @@ When you use the previous example from multi-input wildcards, consider the follo
398399

399400
```yaml
400401
- inputs:
401-
- *.Max # - $1
402-
- *.Min # - $2
403-
output: ColorProperties.*.Avg
402+
- '*.Max' # - $1
403+
- '*.Min' # - $2
404+
output: 'ColorProperties.*.Avg'
404405
expression: ($1 + $2) / 2
405406
406407
- inputs:
407-
- *.Max # - $1
408-
- *.Min # - $2
409-
output: ColorProperties.*.Diff
408+
- '*.Max' # - $1
409+
- '*.Min' # - $2
410+
output: 'ColorProperties.*.Diff'
410411
expression: abs($1 - $2)
411412
```
412413

@@ -437,9 +438,9 @@ Now, consider a scenario where a specific field needs a different calculation:
437438

438439
```yaml
439440
- inputs:
440-
- *.Max # - $1
441-
- *.Min # - $2
442-
output: ColorProperties.*
441+
- '*.Max' # - $1
442+
- '*.Min' # - $2
443+
output: 'ColorProperties.*'
443444
expression: ($1 + $2) / 2
444445
445446
- inputs:
@@ -458,9 +459,9 @@ Consider a special case for the same fields to help decide the right action:
458459

459460
```yaml
460461
- inputs:
461-
- *.Max # - $1
462-
- *.Min # - $2
463-
output: ColorProperties.*
462+
- '*.Max' # - $1
463+
- '*.Min' # - $2
464+
output: 'ColorProperties.*'
464465
expression: ($1 + $2) / 2
465466
466467
- inputs:
@@ -505,8 +506,8 @@ This mapping copies `BaseSalary` from the context dataset directly into the `Emp
505506

506507
```yaml
507508
- inputs:
508-
- $context(position).*
509-
output: Employment.*
509+
- '$context(position).*'
510+
output: 'Employment.*'
510511
```
511512

512513
This configuration allows for a dynamic mapping where every field within the `position` dataset is copied into the `Employment` section of the output record:
@@ -523,13 +524,13 @@ This configuration allows for a dynamic mapping where every field within the `po
523524

524525
## Last known value
525526

526-
You can track the last known value of a property. Suffix the input field with `?last` to capture the last known value of the field. When a property is missing a value in a subsequent input payload, the last known value is mapped to the output payload.
527+
You can track the last known value of a property. Suffix the input field with `? $last` to capture the last known value of the field. When a property is missing a value in a subsequent input payload, the last known value is mapped to the output payload.
527528

528529
For example, consider the following mapping:
529530

530531
```yaml
531532
- inputs:
532-
- Temperature?last
533+
- Temperature ? $last
533534
output: Thermostat.Temperature
534535
```
535536

articles/iot-operations/connect-to-cloud/howto-configure-adlsv2-endpoint.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,16 @@ author: PatAltimore
55
ms.author: patricka
66
ms.subservice: azure-data-flows
77
ms.topic: how-to
8-
ms.date: 08/27/2024
8+
ms.date: 10/02/2024
99
ai-usage: ai-assisted
1010

1111
#CustomerIntent: As an operator, I want to understand how to configure dataflow endpoints for Azure Data Lake Storage Gen2 in Azure IoT Operations so that I can send data to Azure Data Lake Storage Gen2.
1212
---
1313

1414
# Configure dataflow endpoints for Azure Data Lake Storage Gen2
1515

16+
[!INCLUDE [public-preview-note](../includes/public-preview-note.md)]
17+
1618
To send data to Azure Data Lake Storage Gen2 in Azure IoT Operations Preview, you can configure a dataflow endpoint. This configuration allows you to specify the destination endpoint, authentication method, table, and other settings.
1719

1820
## Prerequisites
@@ -47,7 +49,7 @@ To configure a dataflow endpoint for Azure Data Lake Storage Gen2, we suggest us
4749
systemAssignedManagedIdentitySettings: {}
4850
```
4951
50-
If you need to override the system-assigned managed identity audience, see the [system-assigned managed identity](#system-assigned-managed-identity) section.
52+
If you need to override the system-assigned managed identity audience, see the [System-assigned managed identity](#system-assigned-managed-identity) section.
5153
5254
### Use access token authentication
5355
@@ -111,7 +113,7 @@ The following authentication methods are available for Azure Data Lake Storage G
111113

112114
Using the system-assigned managed identity is the recommended authentication method for Azure IoT Operations. Azure IoT Operations creates the managed identity automatically and assigns it to the Azure Arc-enabled Kubernetes cluster. It eliminates the need for secret management and allows for seamless authentication with the Azure Data Lake Storage Gen2 account.
113115

114-
Before creating the dataflow endpoint, you need to assign a role to the managed identity that has write permission to the storage account. For example, you can assign the *Storage Blob Data Contributor* role. To learn more about assigning roles to blobs, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
116+
Before creating the dataflow endpoint, assign a role to the managed identity that has write permission to the storage account. For example, you can assign the *Storage Blob Data Contributor* role. To learn more about assigning roles to blobs, see [Authorize access to blobs using Microsoft Entra ID](../../storage/blobs/authorize-access-azure-active-directory.md).
115117

116118
In the *DataflowEndpoint* resource, specify the managed identity authentication method. In most cases, you don't need to specify other settings. Not specifying an audience creates a managed identity with the default audience scoped to your storage account.
117119

@@ -136,25 +138,25 @@ datalakeStorageSettings:
136138

137139
Using an access token is an alternative authentication method. This method requires you to create a Kubernetes secret with the SAS token and reference the secret in the *DataflowEndpoint* resource.
138140

139-
1. Get a [SAS token](../../storage/common/storage-sas-overview.md) for an Azure Data Lake Storage Gen2 (ADLSv2) account. For example, use the Azure portal to browse to your storage account. On the left menu, choose **Security + networking** > **Shared access signature**. Use the following table to set the required permissions.
141+
Get a [SAS token](../../storage/common/storage-sas-overview.md) for an Azure Data Lake Storage Gen2 (ADLSv2) account. For example, use the Azure portal to browse to your storage account. On the left menu, choose **Security + networking** > **Shared access signature**. Use the following table to set the required permissions.
140142

141-
| Parameter | Enabled setting |
142-
| ---------------------- | --------------------------- |
143-
| Allowed services | Blob |
144-
| Allowed resource types | Object, Container |
145-
| Allowed permissions | Read, Write, Delete, List, Create |
143+
| Parameter | Enabled setting |
144+
| ---------------------- | --------------------------- |
145+
| Allowed services | Blob |
146+
| Allowed resource types | Object, Container |
147+
| Allowed permissions | Read, Write, Delete, List, Create |
146148

147-
1. To enhance security and follow the principle of least privilege, you can generate a SAS token for a specific container. To prevent authentication errors, ensure that the container specified in the SAS token matches the dataflow destination setting in the configuration.
149+
To enhance security and follow the principle of least privilege, you can generate a SAS token for a specific container. To prevent authentication errors, ensure that the container specified in the SAS token matches the dataflow destination setting in the configuration.
148150

149-
1. Create a Kubernetes secret with the SAS token. Don't include the question mark `?` that might be at the beginning of the token.
151+
Create a Kubernetes secret with the SAS token. Don't include the question mark `?` that might be at the beginning of the token.
150152

151153
```bash
152154
kubectl create secret generic my-sas \
153155
--from-literal=accessToken='sv=2022-11-02&ss=b&srt=c&sp=rwdlax&se=2023-07-22T05:47:40Z&st=2023-07-21T21:47:40Z&spr=https&sig=<signature>' \
154156
-n azure-iot-operations
155157
```
156158

157-
Finally, create the DataflowEndpoint resource with the secret reference.
159+
Create the *DataflowEndpoint* resource with the secret reference.
158160

159161
```yaml
160162
datalakeStorageSettings:
@@ -168,17 +170,18 @@ datalakeStorageSettings:
168170

169171
To use a user-assigned managed identity, specify the `UserAssignedManagedIdentity` authentication method and provide the `clientId` and `tenantId` of the managed identity.
170172

171-
172173
```yaml
173174
datalakeStorageSettings:
174175
authentication:
175176
method: UserAssignedManagedIdentity
176177
userAssignedManagedIdentitySettings:
177-
clientId: <id>
178-
tenantId: <id>
178+
clientId: <ID>
179+
tenantId: <ID>
179180
```
180181

181-
### Batching
182+
## Advanced settings
183+
184+
You can set advanced settings for the Azure Data Lake Storage Gen2 endpoint, such as the batching latency and message count.
182185

183186
Use the `batching` settings to configure the maximum number of messages and the maximum latency before the messages are sent to the destination. This setting is useful when you want to optimize for network bandwidth and reduce the number of requests to the destination.
184187

@@ -189,9 +192,11 @@ Use the `batching` settings to configure the maximum number of messages and the
189192

190193
For example, to configure the maximum number of messages to 1000 and the maximum latency to 100 seconds, use the following settings:
191194

195+
Set the values in the dataflow endpoint custom resource.
196+
192197
```yaml
193198
datalakeStorageSettings:
194199
batching:
195200
latencySeconds: 100
196201
maxMessages: 1000
197-
```
202+
```

0 commit comments

Comments
 (0)