Skip to content

Commit e630d2c

Browse files
committed
Mention schema generator
1 parent 9be7a73 commit e630d2c

File tree

4 files changed

+34
-14
lines changed

4 files changed

+34
-14
lines changed

articles/iot-operations/connect-to-cloud/concept-schema-registry.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,12 @@ Delta:
8181
}
8282
```
8383

84+
### Generate a schema
85+
86+
To generate the schema from a sample data file, use the [Azure IoT Operations Schema Generator](https://azure-samples.github.io/explore-iot-operations/).
87+
88+
For a tutorial that uses the schema generator, see [Tutorial: Send data from an OPC UA server to Azure Data Lake Storage Gen 2](./tutorial-opcua-to-data-lake.md).
89+
8490
## How dataflows use message schemas
8591

8692
Message schemas are used in all three phases of a dataflow: defining the source input, applying data transformations, and creating the destination output.

articles/iot-operations/connect-to-cloud/howto-configure-dataflow-endpoint.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@ Use the following table to choose the endpoint type to configure:
2828
| [Azure Data Explorer](howto-configure-adx-endpoint.md) | For uploading data to Azure Data Explorer databases. | No | Yes |
2929
| [Local storage](howto-configure-local-storage-endpoint.md) | For sending data to a locally available persistent volume, through which you can upload data via Azure Container Storage enabled by Azure Arc edge volumes. | No | Yes |
3030

31+
> [!IMPORTANT]
32+
> Storage endpoints require a [schema for serialization](./concept-schema-registry.md). To use dataflow with Microsoft Fabric OneLake, Azure Data Lake Storage, Azure Data Explorer, or Local Storage, you must [specify schema reference](./howto-create-dataflow.md#serialize-data-according-to-a-schema).
33+
>
34+
> To generate the schema from a sample data file, use the [Azure IoT Operations Schema Generator](https://azure-samples.github.io/explore-iot-operations/).
35+
3136
## Dataflows must use local MQTT broker endpoint
3237

3338
When you create a dataflow, you specify the source and destination endpoints. The dataflow moves data from the source endpoint to the destination endpoint. You can use the same endpoint for multiple dataflows, and you can use the same endpoint as both the source and destination in a dataflow.

articles/iot-operations/connect-to-cloud/howto-create-dataflow.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -436,6 +436,9 @@ When using MQTT or Kafka as the source, you can specify a [schema](concept-schem
436436

437437
If the source is an asset, the schema is automatically inferred from the asset definition.
438438

439+
> [!TIP]
440+
> To generate the schema from a sample data file, use the [Azure IoT Operations Schema Generator](https://azure-samples.github.io/explore-iot-operations/).
441+
439442
To configure the schema used to deserialize the incoming messages from a source:
440443

441444
# [Portal](#tab/portal)
@@ -784,6 +787,9 @@ builtInTransformationSettings:
784787

785788
If you want to serialize the data before sending it to the destination, you need to specify a schema and serialization format. Otherwise, the data is serialized in JSON with the types inferred. Storage endpoints like Microsoft Fabric or Azure Data Lake require a schema to ensure data consistency. Supported serialization formats are Parquet and Delta.
786789

790+
> [!TIP]
791+
> To generate the schema from a sample data file, use the [Azure IoT Operations Schema Generator](https://azure-samples.github.io/explore-iot-operations/).
792+
787793
# [Portal](#tab/portal)
788794

789795
For operations experience, you specify the schema and serialization format in the dataflow endpoint details. The endpoints that support serialization formats are Microsoft Fabric OneLake, Azure Data Lake Storage Gen 2, and Azure Data Explorer. For example, to serialize the data in Delta format, you need to upload a schema to the schema registry and reference it in the dataflow destination endpoint configuration.
@@ -822,7 +828,7 @@ To configure a destination for the dataflow, specify the endpoint reference and
822828
To send data to a destination other than the local MQTT broker, create a dataflow endpoint. To learn how, see [Configure dataflow endpoints](howto-configure-dataflow-endpoint.md). If the destination isn't the local MQTT broker, it must be used as a source. To learn more about, see [Dataflows must use local MQTT broker endpoint](./howto-configure-dataflow-endpoint.md#dataflows-must-use-local-mqtt-broker-endpoint).
823829

824830
> [!IMPORTANT]
825-
> Storage endpoints require a schema reference. If you've created storage destination endpoints for Microsoft Fabric OneLake, ADLS Gen 2, Azure Data Explorer and Local Storage, you must specify schema reference.
831+
> Storage endpoints require a [schema for serialization](./concept-schema-registry.md). To use dataflow with Microsoft Fabric OneLake, Azure Data Lake Storage, Azure Data Explorer, or Local Storage, you must [specify schema reference](#serialize-data-according-to-a-schema).
826832

827833
# [Portal](#tab/portal)
828834

articles/iot-operations/connect-to-cloud/tutorial-opcua-to-data-lake.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,10 @@ In the quickstart, the data that comes from the oven asset looks like:
105105

106106
The required schema format for Delta Lake is a JSON object that follows the Delta Lake schema serialization format. The schema should define the structure of the data, including the types and properties of each field. For more details on the schema format, see [Delta Lake schema serialization format documentation](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#schema-serialization-format).
107107

108-
To create a Delta schema that represents the data from the oven asset, create a JSON file with the following content:
108+
> [!TIP]
109+
> To generate the schema from a sample data file, use the [Azure IoT Operations Schema Generator](https://azure-samples.github.io/explore-iot-operations/).
110+
111+
For this tutorial, the schema for the data looks like this:
109112

110113
```json
111114
{
@@ -121,19 +124,19 @@ To create a Delta schema that represents the data from the oven asset, create a
121124
"fields": [
122125
{
123126
"name": "SourceTimestamp",
124-
"type": "string",
125-
"nullable": true,
127+
"type": "timestamp",
128+
"nullable": false,
126129
"metadata": {}
127130
},
128131
{
129132
"name": "Value",
130133
"type": "integer",
131-
"nullable": true,
134+
"nullable": false,
132135
"metadata": {}
133136
}
134137
]
135138
},
136-
"nullable": true,
139+
"nullable": false,
137140
"metadata": {}
138141
},
139142
{
@@ -143,19 +146,19 @@ To create a Delta schema that represents the data from the oven asset, create a
143146
"fields": [
144147
{
145148
"name": "SourceTimestamp",
146-
"type": "string",
147-
"nullable": true,
149+
"type": "timestamp",
150+
"nullable": false,
148151
"metadata": {}
149152
},
150153
{
151154
"name": "Value",
152155
"type": "integer",
153-
"nullable": true,
156+
"nullable": false,
154157
"metadata": {}
155158
}
156159
]
157160
},
158-
"nullable": true,
161+
"nullable": false,
159162
"metadata": {}
160163
},
161164
{
@@ -165,19 +168,19 @@ To create a Delta schema that represents the data from the oven asset, create a
165168
"fields": [
166169
{
167170
"name": "SourceTimestamp",
168-
"type": "string",
169-
"nullable": true,
171+
"type": "timestamp",
172+
"nullable": false,
170173
"metadata": {}
171174
},
172175
{
173176
"name": "Value",
174177
"type": "integer",
175-
"nullable": true,
178+
"nullable": false,
176179
"metadata": {}
177180
}
178181
]
179182
},
180-
"nullable": true,
183+
"nullable": false,
181184
"metadata": {}
182185
}
183186
]

0 commit comments

Comments
 (0)