You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/iot-operations/connect-to-cloud/howto-configure-data-lake.md
+43-32Lines changed: 43 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.author: patricka
7
7
ms.topic: how-to
8
8
ms.custom:
9
9
- ignite-2023
10
-
ms.date: 11/15/2023
10
+
ms.date: 04/02/2024
11
11
12
12
#CustomerIntent: As an operator, I want to understand how to configure Azure IoT MQ so that I can send data from Azure IoT MQ to Data Lake Storage.
13
13
---
@@ -30,7 +30,7 @@ You can use the data lake connector to send data from Azure IoT MQ Preview broke
30
30
| Delta format | Supported |
31
31
| Parquet format | Supported |
32
32
| JSON message payload | Supported |
33
-
| Create new container if doesn't exist| Supported |
33
+
| Create new container if it doesn't exist | Supported |
34
34
| Signed types support | Supported |
35
35
| Unsigned types support | Not Supported |
36
36
@@ -75,8 +75,8 @@ Configure a data lake connector to connect to Microsoft Fabric OneLake using man
75
75
1. Select **Contributor** as the role, thenselect**Add**.
76
76
77
77
1. Create a [DataLakeConnector](#datalakeconnector) resource that defines the configuration and endpoint settings for the connector. You can use the YAML provided as an example, but make sure to change the following fields:
78
-
79
-
- `target.fabriceOneLake.names`: The names of the workspace and the lakehouse. Use either this field or `guids`, don't use both.
78
+
- `target.fabricOneLake.endpoint`: The endpoint of the Microsoft Fabric OneLake account. You can get the endpoint URL from Microsoft Fabric lakehouse under **Files**>**Properties**. The URL should look like `https://onelake.dfs.fabric.microsoft.com`.
79
+
- `target.fabricOneLake.names`: The names of the workspace and the lakehouse. Use either this field or `guids`. Don't use both.
80
80
- `workspaceName`: The name of the workspace.
81
81
- `lakehouseName`: The name of the lakehouse.
82
82
@@ -97,7 +97,8 @@ Configure a data lake connector to connect to Microsoft Fabric OneLake using man
@@ -123,7 +124,7 @@ Configure a data lake connector to connect to Microsoft Fabric OneLake using man
123
124
- `dataLakeConnectorRef`: The name of the DataLakeConnector resource that you created earlier.
124
125
- `clientId`: A unique identifier for your MQTT client.
125
126
- `mqttSourceTopic`: The name of the MQTT topic that you want data to come from.
126
-
- `table.tableName`: The name of the table that you want to append to in the lakehouse. If the table doesn't exist, it's created automatically.
127
+
- `table.tableName`: The name of the table that you want to append to in the lakehouse. The table is created automatically if doesn't exist.
127
128
- `table.schema`: The schema of the Delta table that should match the format and fields of the JSON messages that you send to the MQTT topic.
128
129
129
130
1. Apply the DataLakeConnector and DataLakeConnectorTopicMap resources to your Kubernetes cluster using `kubectl apply -f datalake-connector.yaml`.
@@ -239,15 +240,15 @@ The spec field of a *DataLakeConnector* resource contains the following subfield
239
240
- `accessTokenSecretName`: The name of the Kubernetes secret for using shared access token authentication for the Data Lake Storage account. This field is required if the type is `accessToken`.
240
241
- `systemAssignedManagedIdentity`: For using system managed identity for authentication. It has one subfield
241
242
- `audience`: A string in the form of `https://<my-account-name>.blob.core.windows.net`for the managed identity token audience scoped to the account level or `https://storage.azure.com`for any storage account.
242
-
- `fabriceOneLake`: Specifies the configuration and properties of the Microsoft Fabric OneLake. It has the following subfields:
243
+
- `fabricOneLake`: Specifies the configuration and properties of the Microsoft Fabric OneLake. It has the following subfields:
243
244
- `endpoint`: The URL of the Microsoft Fabric OneLake endpoint. It's usually `https://onelake.dfs.fabric.microsoft.com` because that's the OneLake global endpoint. If you're using a regional endpoint, it's in the form of `https://<region>-onelake.dfs.fabric.microsoft.com`. Don't include any trailing slash `/`. To learn more, see [Connecting to Microsoft OneLake](/fabric/onelake/onelake-access-api).
244
-
- `names`: Specifies the names of the workspace and the lakehouse. Use either this field or `guids`, don't use both. It has the following subfields:
245
+
- `names`: Specifies the names of the workspace and the lakehouse. Use either this field or `guids`. Don't use both. It has the following subfields:
245
246
- `workspaceName`: The name of the workspace.
246
247
- `lakehouseName`: The name of the lakehouse.
247
-
- `guids`: Specifies the GUIDs of the workspace and the lakehouse. Use either this field or `names`, don't use both. It has the following subfields:
248
+
- `guids`: Specifies the GUIDs of the workspace and the lakehouse. Use either this field or `names`. Don't use both. It has the following subfields:
248
249
- `workspaceGuid`: The GUID of the workspace.
249
250
- `lakehouseGuid`: The GUID of the lakehouse.
250
-
- `fabricePath`: The location of the data in the Fabric workspace. It can be either `tables` or `files`. If it's `tables`, the data is stored in the Fabric OneLake as tables. If it's `files`, the data is stored in the Fabric OneLake as files. If it's `files`, the `databaseFormat` must be `parquet`.
251
+
- `fabricPath`: The location of the data in the Fabric workspace. It can be either `tables` or `files`. If it's `tables`, the data is stored in the Fabric OneLake as tables. If it's `files`, the data is stored in the Fabric OneLake as files. If it's `files`, the `databaseFormat` must be `parquet`.
251
252
- `authentication`: The authentication field specifies the type and credentials for accessing the Microsoft Fabric OneLake. It can only be `systemAssignedManagedIdentity`for now. It has one subfield:
252
253
- `systemAssignedManagedIdentity`: For using system managed identity for authentication. It has one subfield
253
254
- `audience`: A string for the managed identity token audience and it must be `https://storage.azure.com`.
Escaped JSON like `{"data": "{\"orderId\": 181, \"item\": \"item181\"}"}` isn't supported and causes the connector to throw a "convertor found a null value" error. An example message for the `orders` topic that works with this schema:
323
+
Stringified JSON like `"{\"SequenceNumber\": 4697, \"Timestamp\": \"2024-04-02T22:36:03.1827681Z\", \"DataSetWriterName\": \"thermostat-de\", \"MessageType\": \"ua-deltaframe\", \"Payload\": {\"temperature\": {\"SourceTimestamp\": \"2024-04-02T22:36:02.6949717Z\", \"Value\": 5506}, \"Tag 10\": {\"SourceTimestamp\": \"2024-04-02T22:36:02.6949888Z\", \"Value\": 5506}}}"` isn't supported and causes the connector to throw a *convertor found a null value* error. An example message for the `dlc` topic that works with this schema:
> If the data schema is updated, for example a data type is changed or a name is changed, transformation of incoming data might stop working. You need to change the data table name if a schema change occurs.
0 commit comments