Skip to content

Commit 0565cdd

Browse files
authored
Merge pull request #42478 from ClareMSYanGit/cyan-master-working-iot
Added new article
2 parents 90352f7 + 7052471 commit 0565cdd

13 files changed

+190
-0
lines changed

articles/iot-hub/TOC.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@
156156
#### [Java](iot-hub-java-java-process-d2c.md)
157157
#### [Node.js](iot-hub-node-node-process-d2c.md)
158158
#### [Python](iot-hub-python-python-process-d2c.md)
159+
### [Query Avro data from a hub route](iot-hub-query-avro-data.md)
159160
### Send cloud-to-device messages
160161
#### [.NET](iot-hub-csharp-csharp-c2d.md)
161162
#### [Java](iot-hub-java-java-c2d.md)
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
---
2+
title: Query Avro data using Azure Data Lake Analytics | Microsoft Docs
3+
description: Use message body properties to route device telemetry to blob storage and query the Avro format data written to blob storage.
4+
services: iot-hub
5+
documentationcenter:
6+
author: ksaye
7+
manager: obloch
8+
ms.service: iot-hub
9+
ms.topic: article
10+
ms.date: 05/29/2018
11+
ms.author: Kevin.Saye
12+
13+
---
14+
15+
# Query Avro data using Azure Data Lake Analytics
16+
17+
This article is about how to query Avro data for efficiently routing messages from Azure IoT Hub to Azure services. Following the blog post announcement—[Azure IoT Hub message routing: now with routing on message body], IoT Hub supports routing on either properties or the message body. See also [Routing on message bodies][Routing on message bodies].
18+
19+
The challenge has been that when Azure IoT Hub routes messages to blob storage, IoT Hub writes the content in Avro format, which has both message body and message properties. Note that IoT Hub only supports writing data to blob storage in the Avro data format, and this format is not used for any other endpoints. See [When using Azure Storage containers][When using Azure storage containers]. While the Avro format is great for data/message preservation, it's challenging for querying the data. In comparison, JSON or CSV format is much easier for querying data.
20+
21+
To solve this, you can use many of the big data patterns for both transforming and scaling data to address non-relational big data needs and formats. One of the patterns, a “pay per query” pattern, is Azure Data Lake Analytics (ADLA). It is the focus of this article. Though you could easily execute the query in Hadoop or other solutions, ADLA is often better suited for this “pay per query” approach. There is an “extractor” for Avro in U-SQL. See [U-SQL Avro Example].
22+
23+
## Query and export Avro data to a CSV file
24+
The section walks you through querying Avro data and exporting it to a CSV file in Azure Blob Storage, though you could easily place the data in other repositories or data stores.
25+
26+
1. Set up Azure IoT Hub to route data to an Azure Blob Storage endpoint using a property in the message body to select messages.
27+
28+
![Screen capture for step 1a][img-query-avro-data-1a]
29+
30+
![Screen capture for step 1b][img-query-avro-data-1b]
31+
32+
2. Ensure your device has the encoding, the content type, and the needed data in either the properties or the message body as referenced in the product documentation. When viewed in Device Explorer (see below), you can verify that these attributes are set correctly.
33+
34+
![Screen capture for step 2][img-query-avro-data-2]
35+
36+
3. Set up an Azure Data Lake Store (ADLS) and an Azure Data Lake Analytics instance. While Azure IoT Hub does not route to an Azure Data Lake Store, ADLA requires one.
37+
38+
![Screen capture for step 3][img-query-avro-data-3]
39+
40+
4. In ADLA, configure the Azure Blob Storage as an additional store, the same Blob Storage that Azure IoT Hub routes data to.
41+
42+
![Screen capture for step 4][img-query-avro-data-4]
43+
44+
5. As discussed in [U-SQL Avro Example], there are 4 DLLs that are needed. Upload these files to a location in your ADLS.
45+
46+
![Screen capture for step 5][img-query-avro-data-5]
47+
48+
6. In Visual Studio, create a U-SQL Project
49+
50+
![Screen capture for step 6][img-query-avro-data-6]
51+
52+
7. Copy the content of the following script and paste it into the newly created file. Modify the 3 highlighted sections: your ADLA account, the associated DLLs' paths, and the correct path for your Storage Account.
53+
54+
![Screen capture for step 7a][img-query-avro-data-7a]
55+
56+
The actual U-SQL script for simple output to CSV:
57+
58+
```sql
59+
DROP ASSEMBLY IF EXISTS [Avro];
60+
CREATE ASSEMBLY [Avro] FROM @"/Assemblies/Avro/Avro.dll";
61+
DROP ASSEMBLY IF EXISTS [Microsoft.Analytics.Samples.Formats];
62+
CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"/Assemblies/Avro/Microsoft.Analytics.Samples.Formats.dll";
63+
DROP ASSEMBLY IF EXISTS [Newtonsoft.Json];
64+
CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll";
65+
DROP ASSEMBLY IF EXISTS [log4net];
66+
CREATE ASSEMBLY [log4net] FROM @"/Assemblies/Avro/log4net.dll";
67+
68+
REFERENCE ASSEMBLY [Newtonsoft.Json];
69+
REFERENCE ASSEMBLY [log4net];
70+
REFERENCE ASSEMBLY [Avro];
71+
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
72+
73+
// Blob container storage account filenames, with any path
74+
DECLARE @input_file string = @"wasb://hottubrawdata@kevinsayazstorage/kevinsayIoT/{*}/{*}/{*}/{*}/{*}/{*}";
75+
DECLARE @output_file string = @"/output/output.csv";
76+
77+
@rs =
78+
EXTRACT
79+
EnqueuedTimeUtc string,
80+
Body byte[]
81+
FROM @input_file
82+
83+
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
84+
{
85+
""type"":""record"",
86+
""name"":""Message"",
87+
""namespace"":""Microsoft.Azure.Devices"",
88+
""fields"":[{
89+
""name"":""EnqueuedTimeUtc"",
90+
""type"":""string""
91+
},
92+
{
93+
""name"":""Properties"",
94+
""type"":{
95+
""type"":""map"",
96+
""values"":""string""
97+
}
98+
},
99+
{
100+
""name"":""SystemProperties"",
101+
""type"":{
102+
""type"":""map"",
103+
""values"":""string""
104+
}
105+
},
106+
{
107+
""name"":""Body"",
108+
""type"":[""null"",""bytes""]
109+
}
110+
]
111+
}");
112+
113+
@cnt =
114+
SELECT EnqueuedTimeUtc AS time, Encoding.UTF8.GetString(Body) AS jsonmessage
115+
FROM @rs;
116+
117+
OUTPUT @cnt TO @output_file USING Outputters.Text();
118+
```
119+
120+
Running the script shown below, ADLA took 5 minutes when limited to 10 Analytic Units and processed 177 files, summarizing the output to a CSV file.
121+
122+
![Screen capture for step 7b][img-query-avro-data-7b]
123+
124+
Viewing the output, you can see the Avro content has converted to a CSV file. Continue to step 8 if you want to parse the JSON.
125+
126+
![Screen capture for step 7c][img-query-avro-data-7c]
127+
128+
129+
8. Most IoT messages are in JSON format. Adding the following lines, you can parse the message into JSON, so you can add the WHERE clauses and only output the needed data.
130+
131+
```sql
132+
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message FROM @rs;
133+
134+
/*
135+
@cnt =
136+
SELECT EnqueuedTimeUtc AS time, Encoding.UTF8.GetString(Body) AS jsonmessage
137+
FROM @rs;
138+
139+
OUTPUT @cnt TO @output_file USING Outputters.Text();
140+
*/
141+
142+
@cnt =
143+
SELECT message["message"] AS iotmessage,
144+
message["event"] AS msgevent,
145+
message["object"] AS msgobject,
146+
message["status"] AS msgstatus,
147+
message["host"] AS msghost
148+
FROM @jsonify;
149+
150+
OUTPUT @cnt TO @output_file USING Outputters.Text();
151+
```
152+
153+
9. Viewing the output, you now see columns for each item in the select command.
154+
155+
![Screen capture for step 8][img-query-avro-data-8]
156+
157+
## Next steps
158+
In this tutorial, you learned how to query Avro data for efficiently routing messages from Azure IoT Hub to Azure services.
159+
160+
To see examples of complete end-to-end solutions that use IoT Hub, see [Azure IoT Remote Monitoring solution accelerator][lnk-iot-sa-land].
161+
162+
To learn more about developing solutions with IoT Hub, see the [IoT Hub developer guide].
163+
164+
To learn more about message routing in IoT Hub, see [Send and receive messages with IoT Hub][lnk-devguide-messaging].
165+
166+
<!-- Images -->
167+
[img-query-avro-data-1a]: ./media/iot-hub-query-avro-data/query-avro-data-1a.png
168+
[img-query-avro-data-1b]: ./media/iot-hub-query-avro-data/query-avro-data-1b.png
169+
[img-query-avro-data-2]: ./media/iot-hub-query-avro-data/query-avro-data-2.png
170+
[img-query-avro-data-3]: ./media/iot-hub-query-avro-data/query-avro-data-3.png
171+
[img-query-avro-data-4]: ./media/iot-hub-query-avro-data/query-avro-data-4.png
172+
[img-query-avro-data-5]: ./media/iot-hub-query-avro-data/query-avro-data-5.png
173+
[img-query-avro-data-6]: ./media/iot-hub-query-avro-data/query-avro-data-6.png
174+
[img-query-avro-data-7a]: ./media/iot-hub-query-avro-data/query-avro-data-7a.png
175+
[img-query-avro-data-7b]: ./media/iot-hub-query-avro-data/query-avro-data-7b.png
176+
[img-query-avro-data-7c]: ./media/iot-hub-query-avro-data/query-avro-data-7c.png
177+
[img-query-avro-data-8]: ./media/iot-hub-query-avro-data/query-avro-data-8.png
178+
179+
<!-- Links -->
180+
[Azure IoT Hub message routing: now with routing on message body]: https://azure.microsoft.com/blog/iot-hub-message-routing-now-with-routing-on-message-body/
181+
182+
[Routing on message bodies]: iot-hub-devguide-query-language.md#routing-on-message-bodies
183+
[When using Azure storage containers]:iot-hub-devguide-endpoints.md#when-using-azure-storage-containers
184+
185+
[U-SQL Avro Example]:https://github.com/Azure/usql/tree/master/Examples/AvroExamples
186+
187+
[lnk-iot-sa-land]: ../iot-accelerators/index.md
188+
[IoT Hub developer guide]: iot-hub-devguide.md
189+
[lnk-devguide-messaging]: iot-hub-devguide-messaging.md
93.8 KB
Loading
65.9 KB
Loading
207 KB
Loading
67.2 KB
Loading
53.4 KB
Loading
83.3 KB
Loading
120 KB
Loading
210 KB
Loading

0 commit comments

Comments
 (0)