|
| 1 | +--- |
| 2 | +title: Query Avro data using Azure Data Lake Analytics | Microsoft Docs |
| 3 | +description: Use message body properties to route device telemetry to blob storage and query the Avro format data written to blob storage. |
| 4 | +services: iot-hub |
| 5 | +documentationcenter: |
| 6 | +author: ksaye |
| 7 | +manager: obloch |
| 8 | +ms.service: iot-hub |
| 9 | +ms.topic: article |
| 10 | +ms.date: 05/29/2018 |
| 11 | +ms.author: Kevin.Saye |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +# Query Avro data using Azure Data Lake Analytics |
| 16 | + |
| 17 | +This article is about how to query Avro data for efficiently routing messages from Azure IoT Hub to Azure services. Following the blog post announcement—[Azure IoT Hub message routing: now with routing on message body], IoT Hub supports routing on either properties or the message body. See also [Routing on message bodies][Routing on message bodies]. |
| 18 | + |
| 19 | +The challenge has been that when Azure IoT Hub routes messages to blob storage, IoT Hub writes the content in Avro format, which has both message body and message properties. Note that IoT Hub only supports writing data to blob storage in the Avro data format, and this format is not used for any other endpoints. See [When using Azure Storage containers][When using Azure storage containers]. While the Avro format is great for data/message preservation, it's challenging for querying the data. In comparison, JSON or CSV format is much easier for querying data. |
| 20 | + |
| 21 | +To solve this, you can use many of the big data patterns for both transforming and scaling data to address non-relational big data needs and formats. One of the patterns, a “pay per query” pattern, is Azure Data Lake Analytics (ADLA). It is the focus of this article. Though you could easily execute the query in Hadoop or other solutions, ADLA is often better suited for this “pay per query” approach. There is an “extractor” for Avro in U-SQL. See [U-SQL Avro Example]. |
| 22 | + |
| 23 | +## Query and export Avro data to a CSV file |
| 24 | +The section walks you through querying Avro data and exporting it to a CSV file in Azure Blob Storage, though you could easily place the data in other repositories or data stores. |
| 25 | + |
| 26 | +1. Set up Azure IoT Hub to route data to an Azure Blob Storage endpoint using a property in the message body to select messages. |
| 27 | + |
| 28 | + ![Screen capture for step 1a][img-query-avro-data-1a] |
| 29 | + |
| 30 | + ![Screen capture for step 1b][img-query-avro-data-1b] |
| 31 | + |
| 32 | +2. Ensure your device has the encoding, the content type, and the needed data in either the properties or the message body as referenced in the product documentation. When viewed in Device Explorer (see below), you can verify that these attributes are set correctly. |
| 33 | + |
| 34 | + ![Screen capture for step 2][img-query-avro-data-2] |
| 35 | + |
| 36 | +3. Set up an Azure Data Lake Store (ADLS) and an Azure Data Lake Analytics instance. While Azure IoT Hub does not route to an Azure Data Lake Store, ADLA requires one. |
| 37 | + |
| 38 | + ![Screen capture for step 3][img-query-avro-data-3] |
| 39 | + |
| 40 | +4. In ADLA, configure the Azure Blob Storage as an additional store, the same Blob Storage that Azure IoT Hub routes data to. |
| 41 | + |
| 42 | + ![Screen capture for step 4][img-query-avro-data-4] |
| 43 | + |
| 44 | +5. As discussed in [U-SQL Avro Example], there are 4 DLLs that are needed. Upload these files to a location in your ADLS. |
| 45 | + |
| 46 | + ![Screen capture for step 5][img-query-avro-data-5] |
| 47 | + |
| 48 | +6. In Visual Studio, create a U-SQL Project |
| 49 | + |
| 50 | + ![Screen capture for step 6][img-query-avro-data-6] |
| 51 | + |
| 52 | +7. Copy the content of the following script and paste it into the newly created file. Modify the 3 highlighted sections: your ADLA account, the associated DLLs' paths, and the correct path for your Storage Account. |
| 53 | + |
| 54 | + ![Screen capture for step 7a][img-query-avro-data-7a] |
| 55 | + |
| 56 | + The actual U-SQL script for simple output to CSV: |
| 57 | + |
| 58 | + ```sql |
| 59 | + DROP ASSEMBLY IF EXISTS [Avro]; |
| 60 | + CREATE ASSEMBLY [Avro] FROM @"/Assemblies/Avro/Avro.dll"; |
| 61 | + DROP ASSEMBLY IF EXISTS [Microsoft.Analytics.Samples.Formats]; |
| 62 | + CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"/Assemblies/Avro/Microsoft.Analytics.Samples.Formats.dll"; |
| 63 | + DROP ASSEMBLY IF EXISTS [Newtonsoft.Json]; |
| 64 | + CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll"; |
| 65 | + DROP ASSEMBLY IF EXISTS [log4net]; |
| 66 | + CREATE ASSEMBLY [log4net] FROM @"/Assemblies/Avro/log4net.dll"; |
| 67 | + |
| 68 | + REFERENCE ASSEMBLY [Newtonsoft.Json]; |
| 69 | + REFERENCE ASSEMBLY [log4net]; |
| 70 | + REFERENCE ASSEMBLY [Avro]; |
| 71 | + REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; |
| 72 | + |
| 73 | + // Blob container storage account filenames, with any path |
| 74 | + DECLARE @input_file string = @"wasb://hottubrawdata@kevinsayazstorage/kevinsayIoT/{*}/{*}/{*}/{*}/{*}/{*}"; |
| 75 | + DECLARE @output_file string = @"/output/output.csv"; |
| 76 | + |
| 77 | + @rs = |
| 78 | + EXTRACT |
| 79 | + EnqueuedTimeUtc string, |
| 80 | + Body byte[] |
| 81 | + FROM @input_file |
| 82 | + |
| 83 | + USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@" |
| 84 | + { |
| 85 | + ""type"":""record"", |
| 86 | + ""name"":""Message"", |
| 87 | + ""namespace"":""Microsoft.Azure.Devices"", |
| 88 | + ""fields"":[{ |
| 89 | + ""name"":""EnqueuedTimeUtc"", |
| 90 | + ""type"":""string"" |
| 91 | + }, |
| 92 | + { |
| 93 | + ""name"":""Properties"", |
| 94 | + ""type"":{ |
| 95 | + ""type"":""map"", |
| 96 | + ""values"":""string"" |
| 97 | + } |
| 98 | + }, |
| 99 | + { |
| 100 | + ""name"":""SystemProperties"", |
| 101 | + ""type"":{ |
| 102 | + ""type"":""map"", |
| 103 | + ""values"":""string"" |
| 104 | + } |
| 105 | + }, |
| 106 | + { |
| 107 | + ""name"":""Body"", |
| 108 | + ""type"":[""null"",""bytes""] |
| 109 | + } |
| 110 | + ] |
| 111 | + }"); |
| 112 | + |
| 113 | + @cnt = |
| 114 | + SELECT EnqueuedTimeUtc AS time, Encoding.UTF8.GetString(Body) AS jsonmessage |
| 115 | + FROM @rs; |
| 116 | + |
| 117 | + OUTPUT @cnt TO @output_file USING Outputters.Text(); |
| 118 | + ``` |
| 119 | + |
| 120 | + Running the script shown below, ADLA took 5 minutes when limited to 10 Analytic Units and processed 177 files, summarizing the output to a CSV file. |
| 121 | + |
| 122 | + ![Screen capture for step 7b][img-query-avro-data-7b] |
| 123 | + |
| 124 | + Viewing the output, you can see the Avro content has converted to a CSV file. Continue to step 8 if you want to parse the JSON. |
| 125 | + |
| 126 | + ![Screen capture for step 7c][img-query-avro-data-7c] |
| 127 | + |
| 128 | + |
| 129 | +8. Most IoT messages are in JSON format. Adding the following lines, you can parse the message into JSON, so you can add the WHERE clauses and only output the needed data. |
| 130 | + |
| 131 | + ```sql |
| 132 | + @jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message FROM @rs; |
| 133 | + |
| 134 | + /* |
| 135 | + @cnt = |
| 136 | + SELECT EnqueuedTimeUtc AS time, Encoding.UTF8.GetString(Body) AS jsonmessage |
| 137 | + FROM @rs; |
| 138 | + |
| 139 | + OUTPUT @cnt TO @output_file USING Outputters.Text(); |
| 140 | + */ |
| 141 | + |
| 142 | + @cnt = |
| 143 | + SELECT message["message"] AS iotmessage, |
| 144 | + message["event"] AS msgevent, |
| 145 | + message["object"] AS msgobject, |
| 146 | + message["status"] AS msgstatus, |
| 147 | + message["host"] AS msghost |
| 148 | + FROM @jsonify; |
| 149 | + |
| 150 | + OUTPUT @cnt TO @output_file USING Outputters.Text(); |
| 151 | + ``` |
| 152 | + |
| 153 | +9. Viewing the output, you now see columns for each item in the select command. |
| 154 | + |
| 155 | + ![Screen capture for step 8][img-query-avro-data-8] |
| 156 | + |
| 157 | +## Next steps |
| 158 | +In this tutorial, you learned how to query Avro data for efficiently routing messages from Azure IoT Hub to Azure services. |
| 159 | + |
| 160 | +To see examples of complete end-to-end solutions that use IoT Hub, see [Azure IoT Remote Monitoring solution accelerator][lnk-iot-sa-land]. |
| 161 | + |
| 162 | +To learn more about developing solutions with IoT Hub, see the [IoT Hub developer guide]. |
| 163 | + |
| 164 | +To learn more about message routing in IoT Hub, see [Send and receive messages with IoT Hub][lnk-devguide-messaging]. |
| 165 | + |
| 166 | +<!-- Images --> |
| 167 | +[img-query-avro-data-1a]: ./media/iot-hub-query-avro-data/query-avro-data-1a.png |
| 168 | +[img-query-avro-data-1b]: ./media/iot-hub-query-avro-data/query-avro-data-1b.png |
| 169 | +[img-query-avro-data-2]: ./media/iot-hub-query-avro-data/query-avro-data-2.png |
| 170 | +[img-query-avro-data-3]: ./media/iot-hub-query-avro-data/query-avro-data-3.png |
| 171 | +[img-query-avro-data-4]: ./media/iot-hub-query-avro-data/query-avro-data-4.png |
| 172 | +[img-query-avro-data-5]: ./media/iot-hub-query-avro-data/query-avro-data-5.png |
| 173 | +[img-query-avro-data-6]: ./media/iot-hub-query-avro-data/query-avro-data-6.png |
| 174 | +[img-query-avro-data-7a]: ./media/iot-hub-query-avro-data/query-avro-data-7a.png |
| 175 | +[img-query-avro-data-7b]: ./media/iot-hub-query-avro-data/query-avro-data-7b.png |
| 176 | +[img-query-avro-data-7c]: ./media/iot-hub-query-avro-data/query-avro-data-7c.png |
| 177 | +[img-query-avro-data-8]: ./media/iot-hub-query-avro-data/query-avro-data-8.png |
| 178 | + |
| 179 | +<!-- Links --> |
| 180 | +[Azure IoT Hub message routing: now with routing on message body]: https://azure.microsoft.com/blog/iot-hub-message-routing-now-with-routing-on-message-body/ |
| 181 | + |
| 182 | +[Routing on message bodies]: iot-hub-devguide-query-language.md#routing-on-message-bodies |
| 183 | +[When using Azure storage containers]:iot-hub-devguide-endpoints.md#when-using-azure-storage-containers |
| 184 | + |
| 185 | +[U-SQL Avro Example]:https://github.com/Azure/usql/tree/master/Examples/AvroExamples |
| 186 | + |
| 187 | +[lnk-iot-sa-land]: ../iot-accelerators/index.md |
| 188 | +[IoT Hub developer guide]: iot-hub-devguide.md |
| 189 | +[lnk-devguide-messaging]: iot-hub-devguide-messaging.md |
0 commit comments