Skip to content

Commit 30fb9d5

Browse files
Added details under specify schema to deserialize data
1 parent 4178d14 commit 30fb9d5

File tree

1 file changed

+91
-1
lines changed

1 file changed

+91
-1
lines changed

articles/iot-operations/connect-to-cloud/howto-create-dataflow.md

Lines changed: 91 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ To create a dataflow in the operations experience portal, select **Dataflow** >
6060

6161
# [Bicep](#tab/bicep)
6262

63-
The overall structure of a dataflow configuration is as follows:
63+
The overall structure of a dataflow configuration for Bicep is as follows:
6464

6565
```bicep
6666
resource dataflow 'Microsoft.IoTOperations/instances/dataflowProfiles/dataflows@2024-08-15-preview' = {
@@ -149,6 +149,8 @@ You can use an [asset](../discover-manage-assets/overview-manage-assets.md) as t
149149

150150
# [Bicep](#tab/bicep)
151151

152+
Configuring an asset as a source is only available in the operations experience portal.
153+
152154
# [Kubernetes](#tab/kubernetes)
153155

154156
Configuring an asset as a source is only available in the operations experience portal.
@@ -169,6 +171,94 @@ Configuring an asset as a source is only available in the operations experience
169171

170172
# [Bicep](#tab/bicep)
171173

174+
To configure a source using an MQTT endpoint, use the following configuration:
175+
176+
```bicep
177+
{
178+
operationType: 'Source'
179+
sourceSettings: {
180+
endpointRef: defaultDataflowEndpoint.name
181+
dataSources: array('azure-iot-operations/data/thermostat')
182+
}
183+
}
184+
```
185+
186+
`dataSources`: This is an array of MQTT topic(s) that define where the data will be sourced from. In this example, `azure-iot-operations/data/thermostat` refers to a specific topic where thermostat data is being published.
187+
188+
Datasources allow you to specify multiple MQTT or Kafka topics without needing to modify the endpoint configuration. This means the same endpoint can be reused across multiple dataflows, even if the topics vary. To learn more, see [Reuse dataflow endpoints](./howto-configure-dataflow-endpoint.md#reuse-endpoints).
189+
190+
#### Specify schema to deserialize data
191+
192+
If the source data has optional fields or fields with different types, specify a deserialization schema to ensure consistency. For example, the data might have fields that aren't present in all messages. Without the schema, the transformation can't handle these fields as they would have empty values. With the schema, you can specify default values or ignore the fields.
193+
194+
The following configuration demonstrates how to define a schema in your Bicep file. This schema will ensure proper deserialization of asset data. In this example, the schema defines fields such as `asset_id`, `asset_name`, `location`, `temperature`, `manufacturer`, `production_date`, and `serial_number`. Each field is assigned a specific data type (e.g., `string`) and marked as non-nullable. This ensures all incoming messages contain these fields with valid data. Such structure maintains consistency and enables the system to handle structured input more reliably.
195+
196+
```bicep
197+
var assetDeltaSchema = '''
198+
{
199+
"$schema": "Delta/1.0",
200+
"type": "object",
201+
"properties": {
202+
"type": "struct",
203+
"fields": [
204+
{ "name": "asset_id", "type": "string", "nullable": false, "metadata": {} },
205+
{ "name": "asset_name", "type": "string", "nullable": false, "metadata": {} },
206+
{ "name": "location", "type": "string", "nullable": false, "metadata": {} },
207+
{ "name": "manufacturer", "type": "string", "nullable": false, "metadata": {} },
208+
{ "name": "production_date", "type": "string", "nullable": false, "metadata": {} },
209+
{ "name": "serial_number", "type": "string", "nullable": false, "metadata": {} },
210+
{ "name": "temperature", "type": "double", "nullable": false, "metadata": {} }
211+
]
212+
}
213+
}
214+
'''
215+
```
216+
217+
To register the schema with the Azure Schema Registry, use the following Bicep configuration. This configuration creates a schema definition and assigns it a version within the schema registry, allowing it to be referenced later in your data transformations.
218+
219+
```bicep
220+
param opcuaSchemaName string = 'opcua-output-delta'
221+
param opcuaSchemaVer string = '1'
222+
223+
resource opcSchema 'Microsoft.DeviceRegistry/schemaRegistries/schemas@2024-09-01-preview' = {
224+
parent: schemaRegistry
225+
name: opcuaSchemaName
226+
properties: {
227+
displayName: 'OPC UA Delta Schema'
228+
description: 'This is a OPC UA delta Schema'
229+
format: 'Delta/1.0'
230+
schemaType: 'MessageSchema'
231+
}
232+
}
233+
234+
resource opcuaSchemaInstance 'Microsoft.DeviceRegistry/schemaRegistries/schemas/schemaVersions@2024-09-01-preview' = {
235+
parent: opcSchema
236+
name: opcuaSchemaVer
237+
properties: {
238+
description: 'Schema version'
239+
schemaContent: opcuaSchemaContent
240+
}
241+
}
242+
```
243+
244+
Once the schema is registered, it can be referenced in transformations to ensure that the source data is correctly deserialized. In the configuration below, the schemaRef points to the specific schema version to be used, and the serializationFormat defines how the data will be serialized during the transformation process.
245+
246+
```bicep
247+
{
248+
operationType: 'BuiltInTransformation'
249+
builtInTransformationSettings: {
250+
// ..
251+
schemaRef: 'aio-sr://${opcuaSchemaName}:${opcuaSchemaVer}'
252+
serializationFormat: 'Parquet' // can also be 'Delta'
253+
}
254+
}
255+
```
256+
257+
> [!NOTE]
258+
> The only supported serialization format is Delta or Parquet. The schema is optional.
259+
260+
For more information about schema registry, see [Understand message schemas](concept-schema-registry.md).
261+
172262
# [Kubernetes](#tab/kubernetes)
173263

174264
For example, to configure a source using an MQTT endpoint and two MQTT topic filters, use the following configuration:

0 commit comments

Comments
 (0)