Skip to content

Commit 790a9b8

Browse files
Merge pull request #270967 from PatAltimore/patricka-datalake-crd
Update datalake config samples
2 parents cfe1661 + bb1bd64 commit 790a9b8

File tree

1 file changed

+43
-32
lines changed

1 file changed

+43
-32
lines changed

articles/iot-operations/connect-to-cloud/howto-configure-data-lake.md

Lines changed: 43 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: patricka
77
ms.topic: how-to
88
ms.custom:
99
- ignite-2023
10-
ms.date: 11/15/2023
10+
ms.date: 04/02/2024
1111

1212
#CustomerIntent: As an operator, I want to understand how to configure Azure IoT MQ so that I can send data from Azure IoT MQ to Data Lake Storage.
1313
---
@@ -30,7 +30,7 @@ You can use the data lake connector to send data from Azure IoT MQ Preview broke
3030
| Delta format | Supported |
3131
| Parquet format | Supported |
3232
| JSON message payload | Supported |
33-
| Create new container if doesn't exist | Supported |
33+
| Create new container if it doesn't exist | Supported |
3434
| Signed types support | Supported |
3535
| Unsigned types support | Not Supported |
3636

@@ -75,8 +75,8 @@ Configure a data lake connector to connect to Microsoft Fabric OneLake using man
7575
1. Select **Contributor** as the role, then select **Add**.
7676

7777
1. Create a [DataLakeConnector](#datalakeconnector) resource that defines the configuration and endpoint settings for the connector. You can use the YAML provided as an example, but make sure to change the following fields:
78-
79-
- `target.fabriceOneLake.names`: The names of the workspace and the lakehouse. Use either this field or `guids`, don't use both.
78+
- `target.fabricOneLake.endpoint`: The endpoint of the Microsoft Fabric OneLake account. You can get the endpoint URL from Microsoft Fabric lakehouse under **Files** > **Properties**. The URL should look like `https://onelake.dfs.fabric.microsoft.com`.
79+
- `target.fabricOneLake.names`: The names of the workspace and the lakehouse. Use either this field or `guids`. Don't use both.
8080
- `workspaceName`: The name of the workspace.
8181
- `lakehouseName`: The name of the lakehouse.
8282
@@ -97,7 +97,8 @@ Configure a data lake connector to connect to Microsoft Fabric OneLake using man
9797
databaseFormat: delta
9898
target:
9999
fabricOneLake:
100-
endpoint: https://msit-onelake.dfs.fabric.microsoft.com
100+
# Example: https://onelake.dfs.fabric.microsoft.com
101+
endpoint: <example-endpoint-url>
101102
names:
102103
workspaceName: <example-workspace-name>
103104
lakehouseName: <example-lakehouse-name>
@@ -123,7 +124,7 @@ Configure a data lake connector to connect to Microsoft Fabric OneLake using man
123124
- `dataLakeConnectorRef`: The name of the DataLakeConnector resource that you created earlier.
124125
- `clientId`: A unique identifier for your MQTT client.
125126
- `mqttSourceTopic`: The name of the MQTT topic that you want data to come from.
126-
- `table.tableName`: The name of the table that you want to append to in the lakehouse. If the table doesn't exist, it's created automatically.
127+
- `table.tableName`: The name of the table that you want to append to in the lakehouse. The table is created automatically if doesn't exist.
127128
- `table.schema`: The schema of the Delta table that should match the format and fields of the JSON messages that you send to the MQTT topic.
128129

129130
1. Apply the DataLakeConnector and DataLakeConnectorTopicMap resources to your Kubernetes cluster using `kubectl apply -f datalake-connector.yaml`.
@@ -239,15 +240,15 @@ The spec field of a *DataLakeConnector* resource contains the following subfield
239240
- `accessTokenSecretName`: The name of the Kubernetes secret for using shared access token authentication for the Data Lake Storage account. This field is required if the type is `accessToken`.
240241
- `systemAssignedManagedIdentity`: For using system managed identity for authentication. It has one subfield
241242
- `audience`: A string in the form of `https://<my-account-name>.blob.core.windows.net` for the managed identity token audience scoped to the account level or `https://storage.azure.com` for any storage account.
242-
- `fabriceOneLake`: Specifies the configuration and properties of the Microsoft Fabric OneLake. It has the following subfields:
243+
- `fabricOneLake`: Specifies the configuration and properties of the Microsoft Fabric OneLake. It has the following subfields:
243244
- `endpoint`: The URL of the Microsoft Fabric OneLake endpoint. It's usually `https://onelake.dfs.fabric.microsoft.com` because that's the OneLake global endpoint. If you're using a regional endpoint, it's in the form of `https://<region>-onelake.dfs.fabric.microsoft.com`. Don't include any trailing slash `/`. To learn more, see [Connecting to Microsoft OneLake](/fabric/onelake/onelake-access-api).
244-
- `names`: Specifies the names of the workspace and the lakehouse. Use either this field or `guids`, don't use both. It has the following subfields:
245+
- `names`: Specifies the names of the workspace and the lakehouse. Use either this field or `guids`. Don't use both. It has the following subfields:
245246
- `workspaceName`: The name of the workspace.
246247
- `lakehouseName`: The name of the lakehouse.
247-
- `guids`: Specifies the GUIDs of the workspace and the lakehouse. Use either this field or `names`, don't use both. It has the following subfields:
248+
- `guids`: Specifies the GUIDs of the workspace and the lakehouse. Use either this field or `names`. Don't use both. It has the following subfields:
248249
- `workspaceGuid`: The GUID of the workspace.
249250
- `lakehouseGuid`: The GUID of the lakehouse.
250-
- `fabricePath`: The location of the data in the Fabric workspace. It can be either `tables` or `files`. If it's `tables`, the data is stored in the Fabric OneLake as tables. If it's `files`, the data is stored in the Fabric OneLake as files. If it's `files`, the `databaseFormat` must be `parquet`.
251+
- `fabricPath`: The location of the data in the Fabric workspace. It can be either `tables` or `files`. If it's `tables`, the data is stored in the Fabric OneLake as tables. If it's `files`, the data is stored in the Fabric OneLake as files. If it's `files`, the `databaseFormat` must be `parquet`.
251252
- `authentication`: The authentication field specifies the type and credentials for accessing the Microsoft Fabric OneLake. It can only be `systemAssignedManagedIdentity` for now. It has one subfield:
252253
- `systemAssignedManagedIdentity`: For using system managed identity for authentication. It has one subfield
253254
- `audience`: A string for the managed identity token audience and it must be `https://storage.azure.com`.
@@ -292,49 +293,59 @@ spec:
292293
messagePayloadType: "json"
293294
maxMessagesPerBatch: 10
294295
clientId: id
295-
mqttSourceTopic: "orders"
296+
mqttSourceTopic: "azure-iot-operations/data/opc-ua-connector-de/thermostat-de"
296297
qos: 1
297298
table:
298-
tableName: "orders"
299+
tableName: thermostat
299300
schema:
300-
- name: "orderId"
301-
format: int32
302-
optional: false
303-
mapping: "data.orderId"
304-
- name: "item"
301+
- name: externalAssetId
305302
format: utf8
306303
optional: false
307-
mapping: "data.item"
308-
- name: "clientId"
304+
mapping: $property.externalAssetId
305+
- name: assetName
309306
format: utf8
310307
optional: false
311-
mapping: "$client_id"
312-
- name: "mqttTopic"
313-
format: utf8
308+
mapping: DataSetWriterName
309+
- name: CurrentTemperature
310+
format: float32
314311
optional: false
315-
mapping: "$topic"
316-
- name: "timestamp"
312+
mapping: Payload.temperature.Value
313+
- name: Pressure
314+
format: float32
315+
optional: true
316+
mapping: "Payload.Tag 10.Value"
317+
- name: Timestamp
317318
format: timestamp
318319
optional: false
319-
mapping: "$received_time"
320+
mapping: $received_time
320321
```
321322

322-
Escaped JSON like `{"data": "{\"orderId\": 181, \"item\": \"item181\"}"}` isn't supported and causes the connector to throw a "convertor found a null value" error. An example message for the `orders` topic that works with this schema:
323+
Stringified JSON like `"{\"SequenceNumber\": 4697, \"Timestamp\": \"2024-04-02T22:36:03.1827681Z\", \"DataSetWriterName\": \"thermostat-de\", \"MessageType\": \"ua-deltaframe\", \"Payload\": {\"temperature\": {\"SourceTimestamp\": \"2024-04-02T22:36:02.6949717Z\", \"Value\": 5506}, \"Tag 10\": {\"SourceTimestamp\": \"2024-04-02T22:36:02.6949888Z\", \"Value\": 5506}}}"` isn't supported and causes the connector to throw a *convertor found a null value* error. An example message for the `dlc` topic that works with this schema:
323324
324325
```json
325326
{
326-
"data": {
327-
"orderId": 181,
328-
"item": "item181"
327+
"SequenceNumber": 4697,
328+
"Timestamp": "2024-04-02T22:36:03.1827681Z",
329+
"DataSetWriterName": "thermostat-de",
330+
"MessageType": "ua-deltaframe",
331+
"Payload": {
332+
"temperature": {
333+
"SourceTimestamp": "2024-04-02T22:36:02.6949717Z",
334+
"Value": 5506
335+
},
336+
"Tag 10": {
337+
"SourceTimestamp": "2024-04-02T22:36:02.6949888Z",
338+
"Value": 5506
339+
}
329340
}
330341
}
331342
```
332343
333344
Which maps to:
334345
335-
| orderId | item | clientId | mqttTopic | timestamp |
336-
| ------- | ------- | -------- | --------- | ------------------------------ |
337-
| 181 | item181 | id | orders | 2023-07-28T12:45:59.324310806Z |
346+
| externalAssetId | assetName | CurrentTemperature | Pressure | mqttTopic | timestamp |
347+
| ------------------------------------ | --------------- | ------------------ | -------- | ----------------------------- | ------------------------------ |
348+
| 59ad3b8b-c840-43b5-b79d-7804c6f42172 | thermostat-de | 5506 | 5506 | dlc | 2024-04-02T22:36:03.1827681Z |
338349
339350
> [!IMPORTANT]
340351
> If the data schema is updated, for example a data type is changed or a name is changed, transformation of incoming data might stop working. You need to change the data table name if a schema change occurs.

0 commit comments

Comments
 (0)