Skip to content

Commit 88d5e5c

Browse files
Merge pull request #274393 from jlian/patch-93
Fix ADX example in datalake doc
2 parents 61345f7 + 48a60d4 commit 88d5e5c

File tree

1 file changed

+46
-79
lines changed

1 file changed

+46
-79
lines changed

articles/iot-operations/connect-to-cloud/howto-configure-data-lake.md

Lines changed: 46 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: patricka
77
ms.topic: how-to
88
ms.custom:
99
- ignite-2023
10-
ms.date: 05/01/2024
10+
ms.date: 05/06/2024
1111

1212
#CustomerIntent: As an operator, I want to understand how to configure Azure IoT MQ so that I can send data from Azure IoT MQ to Data Lake Storage.
1313
---
@@ -16,32 +16,18 @@ ms.date: 05/01/2024
1616

1717
[!INCLUDE [public-preview-note](../includes/public-preview-note.md)]
1818

19-
You can use the data lake connector to send data from Azure IoT MQ Preview broker to a data lake, like Azure Data Lake Storage Gen2 (ADLSv2) and Microsoft Fabric OneLake. The connector subscribes to MQTT topics and ingests the messages into Delta tables in the Data Lake Storage account.
20-
21-
## What's supported
22-
23-
| Feature | Supported |
24-
| ----------------------------------------- | --------- |
25-
| Send data to Azure Data Lake Storage Gen2 | Supported |
26-
| Send data to local storage | Supported |
27-
| Send data Microsoft Fabric OneLake | Supported |
28-
| Use SAS token for authentication | Supported |
29-
| Use managed identity for authentication | Supported |
30-
| Delta format | Supported |
31-
| Parquet format | Supported |
32-
| JSON message payload | Supported |
33-
| Create new container if it doesn't exist | Supported |
34-
| Signed types support | Supported |
35-
| Unsigned types support | Not Supported |
19+
You can use the data lake connector to send data from Azure IoT MQ Preview broker to a data lake, like Azure Data Lake Storage Gen2 (ADLSv2), Microsoft Fabric OneLake, and Azure Data Explorer. The connector subscribes to MQTT topics and ingests the messages into Delta tables in the Data Lake Storage account.
3620

3721
## Prerequisites
3822

3923
- A Data Lake Storage account in Azure with a container and a folder for your data. For more information about creating a Data Lake Storage, use one of the following quickstart options:
4024
- Microsoft Fabric OneLake quickstart:
41-
- [Create a workspace](/fabric/get-started/create-workspaces) since the default *my workspace* isn't supported.
42-
- [Create a lakehouse](/fabric/onelake/create-lakehouse-onelake).
25+
- [Create a workspace](/fabric/get-started/create-workspaces) since the default *my workspace* isn't supported.
26+
- [Create a lakehouse](/fabric/onelake/create-lakehouse-onelake).
4327
- Azure Data Lake Storage Gen2 quickstart:
44-
- [Create a storage account to use with Azure Data Lake Storage Gen2](/azure/storage/blobs/create-data-lake-storage-account).
28+
- [Create a storage account to use with Azure Data Lake Storage Gen2](/azure/storage/blobs/create-data-lake-storage-account).
29+
- Azure Data Explorer cluster:
30+
- Follow the **Full cluster** steps in the [Quickstart: Create an Azure Data Explorer cluster and database](/azure/data-explorer/create-cluster-and-database?tabs=full).
4531

4632
- An IoT MQ MQTT broker. For more information on how to deploy an IoT MQ MQTT broker, see [Quickstart: Deploy Azure IoT Operations Preview to an Arc-enabled Kubernetes cluster](../get-started/quickstart-deploy.md).
4733

@@ -225,22 +211,20 @@ authentication:
225211
226212
Configure the data lake connector to send data to an Azure Data Explorer endpoint using managed identity.
227213
228-
1. To deploy an Azure Data Explorer cluster, follow the **Full cluster** steps in the [Quickstart: Create an Azure Data Explorer cluster and database](/azure/data-explorer/create-cluster-and-database?tabs=full).
214+
1. Ensure that the steps in prerequisites are met, including a full Azure Data Explorer cluster. The "free cluster" option doesn't work.
229215

230216
1. After the cluster is created, create a database to store your data.
231217

232-
1. You can create a table for given data via the Azure portal and create columns manually, or you can use [KQL](/azure/data-explorer/kusto/management/create-table-command) in the query tab.
218+
1. You can create a table for given data via the Azure portal and create columns manually, or you can use [KQL](/azure/data-explorer/kusto/management/create-table-command) in the query tab. For example:
233219

234-
For example:
220+
```kql
235221
.create table thermostat (
236222
externalAssetId: string,
237223
assetName: string,
238224
CurrentTemperature: real,
239225
Pressure: real,
240226
MqttTopic: string,
241227
Timestamp: datetime
242-
)
243-
timestamp: datetime
244228
)
245229
```
246230

@@ -252,12 +236,6 @@ Enable streaming ingestion on your table and database. In the query tab, run the
252236
.alter database <DATABASE_NAME> policy streamingingestion enable
253237
```
254238

255-
For example:
256-
257-
```kql
258-
.alter database TestDatabase policy streamingingestion enable
259-
```
260-
261239
### Add the managed identity to the Azure Data Explorer cluster
262240

263241
In order for the connector to authenticate to Azure Data Explorer, you must add the managed identity to the Azure Data Explorer cluster.
@@ -277,46 +255,42 @@ Example deployment file for the Azure Data Explorer connector. Comments that beg
277255
apiVersion: mq.iotoperations.azure.com/v1beta1
278256
name: my-adx-connector
279257
namespace: azure-iot-operations
280-
name: my-datalake-connector
281-
namespace: mq
282258
spec:
283259
repository: mcr.microsoft.com/azureiotoperations/datalake
284260
tag: 0.4.0-preview
285-
repository: edgebuilds.azurecr.io/datalake
286-
tag: edge
287261
pullPolicy: Always
288-
repository: mcr.microsoft.com/azureiotoperations/datalake
289-
tag: 0.4.0-preview
290-
databaseFormat: "adx"
262+
databaseFormat: adx
291263
target:
292-
endpoint: https://<cluster>.<region>.kusto.windows.net
293-
# TODO: insert the ADX cluster endpoint formatted as <cluster>.<region>.kusto.windows.net
294-
endpoint: "<endpoint>"
264+
# TODO: insert the ADX cluster endpoint
265+
endpoint: https://<CLUSTER>.<REGION>.kusto.windows.net
295266
authentication:
267+
systemAssignedManagedIdentity:
268+
audience: "https://api.kusto.windows.net"
296269
localBrokerConnection:
297270
endpoint: aio-mq-dmqtt-frontend:8883
298271
tls:
299272
tlsEnabled: true
300273
trustedCaCertificateConfigMap: aio-ca-trust-bundle-test-only
301274
authentication:
302275
kubernetes: {}
303-
---
304-
endpoint: aio-mq-dmqtt-frontend:8883
305-
tls:
306-
tlsEnabled: true
307-
name: adx-topicmap
308-
authentication:
309-
kubernetes: {}
310-
dataLakeConnectorRef: my-adx-connector
311-
audience: "https://api.kusto.windows.net"
312276
---
313277
apiVersion: mq.iotoperations.azure.com/v1beta1
314278
kind: DataLakeConnectorTopicMap
315279
metadata:
316-
mqttSourceTopic: "azure-iot-operations/data/thermostat"
280+
name: adx-topicmap
317281
namespace: azure-iot-operations
318282
spec:
319-
dataLakeConnectorRef: "my-datalake-connector"
283+
mapping:
284+
allowedLatencySecs: 1
285+
messagePayloadType: json
286+
maxMessagesPerBatch: 10
287+
clientId: id
288+
mqttSourceTopic: azure-iot-operations/data/thermostat
289+
qos: 1
290+
table:
291+
# TODO: add DB and table name
292+
tablePath: <DATABASE_NAME>
293+
tableName: <TABLE_NAME>
320294
schema:
321295
- name: externalAssetId
322296
format: utf8
@@ -342,36 +316,29 @@ spec:
342316
format: timestamp
343317
optional: false
344318
mapping: $received_time
345-
format: float32
346-
optional: false
347-
mapping: "$data.pressure"
348-
- name: "mqttTopic"
349-
format: utf8
350-
optional: false
351-
mapping: "$topic"
352-
- name: "timestamp"
353-
format: timestamp
354-
optional: false
355-
mapping: "$received_time"
356319
```
357320
358-
This example accepts data from the `dlc` topic with messages in JSON format such as the following:
321+
This example accepts data from the `azure-iot-operations/data/thermostat` topic with messages in JSON format such as the following:
359322
360323
```json
361324
{
362-
"data": {
363-
"externalAssetID": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
364-
"assetName": "thermostat-de",
365-
"currentTemperature": 5506,
366-
"pressure": 5506,
367-
"mqttTopic": "dlc",
368-
"timestamp": "2024-04-02T22:36:03.1827681Z"
325+
"SequenceNumber": 4697,
326+
"Timestamp": "2024-04-02T22:36:03.1827681Z",
327+
"DataSetWriterName": "thermostat",
328+
"MessageType": "ua-deltaframe",
329+
"Payload": {
330+
"temperature": {
331+
"SourceTimestamp": "2024-04-02T22:36:02.6949717Z",
332+
"Value": 5506
333+
},
334+
"Tag 10": {
335+
"SourceTimestamp": "2024-04-02T22:36:02.6949888Z",
336+
"Value": 5506
337+
}
369338
}
370339
}
371340
```
372341
373-
374-
375342
## DataLakeConnector
376343
377344
A *DataLakeConnector* is a Kubernetes custom resource that defines the configuration and properties of a data lake connector instance. A data lake connector ingests data from MQTT topics into Delta tables in a Data Lake Storage account.
@@ -441,13 +408,13 @@ metadata:
441408
name: datalake-topicmap
442409
namespace: azure-iot-operations
443410
spec:
444-
dataLakeConnectorRef: "my-datalake-connector"
411+
dataLakeConnectorRef: my-datalake-connector
445412
mapping:
446413
allowedLatencySecs: 1
447-
messagePayloadType: "json"
414+
messagePayloadType: json
448415
maxMessagesPerBatch: 10
449416
clientId: id
450-
mqttSourceTopic: "azure-iot-operations/data/opc-ua-connector-de/thermostat-de"
417+
mqttSourceTopic: `azure-iot-operations/data/thermostat`
451418
qos: 1
452419
table:
453420
tableName: thermostat
@@ -476,13 +443,13 @@ spec:
476443

477444
Stringified JSON like `"{\"SequenceNumber\": 4697, \"Timestamp\": \"2024-04-02T22:36:03.1827681Z\", \"DataSetWriterName\": \"thermostat-de\", \"MessageType\": \"ua-deltaframe\", \"Payload\": {\"temperature\": {\"SourceTimestamp\": \"2024-04-02T22:36:02.6949717Z\", \"Value\": 5506}, \"Tag 10\": {\"SourceTimestamp\": \"2024-04-02T22:36:02.6949888Z\", \"Value\": 5506}}}"` isn't supported and causes the connector to throw a *convertor found a null value* error.
478445
479-
An example message for the `dlc` topic that works with this schema:
446+
An example message for the `azure-iot-operations/data/thermostat` topic that works with this schema:
480447
481448
```json
482449
{
483450
"SequenceNumber": 4697,
484451
"Timestamp": "2024-04-02T22:36:03.1827681Z",
485-
"DataSetWriterName": "thermostat-de",
452+
"DataSetWriterName": "thermostat",
486453
"MessageType": "ua-deltaframe",
487454
"Payload": {
488455
"temperature": {

0 commit comments

Comments
 (0)