Skip to content

Commit 7e0afdb

Browse files
committed
fix
2 parents b5e4382 + fc0950b commit 7e0afdb

File tree

8 files changed

+11
-28
lines changed

8 files changed

+11
-28
lines changed
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

articles/hdinsight/hadoop/using-json-in-hive.md

Lines changed: 11 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,17 @@
11
---
2-
title: Analyze and process JSON documents with Apache Hive - Azure HDInsight
3-
description: Learn how to use JSON documents and analyze them by using Apache Hive in Azure HDInsight
2+
title: Analyze and process JSON documents with Apache Hive in Azure HDInsight
3+
description: Learn how to use JSON documents and analyze them by using Apache Hive in Azure HDInsight.
44
author: hrasheed-msft
5+
ms.author: hrasheed
56
ms.reviewer: jasonh
6-
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 02/27/2019
11-
ms.author: hrasheed
12-
9+
ms.date: 06/03/2019
1310
---
11+
1412
# Process and analyze JSON documents by using Apache Hive in Azure HDInsight
1513

16-
Learn how to process and analyze JavaScript Object Notation (JSON) files by using Apache Hive in Azure HDInsight. This tutorial uses the following JSON document:
14+
Learn how to process and analyze JavaScript Object Notation (JSON) files by using Apache Hive in Azure HDInsight. This article uses the following JSON document:
1715

1816
```json
1917
{
@@ -52,7 +50,7 @@ Learn how to process and analyze JavaScript Object Notation (JSON) files by usin
5250
}
5351
```
5452

55-
The file can be found at **wasb://processjson\@hditutorialdata.blob.core.windows.net/**. For more information on how to use Azure Blob storage with HDInsight, see [Use HDFS-compatible Azure Blob storage with Apache Hadoop in HDInsight](../hdinsight-hadoop-use-blob-storage.md). You can copy the file to the default container of your cluster.
53+
The file can be found at `wasb://[email protected]/`. For more information on how to use Azure Blob storage with HDInsight, see [Use HDFS-compatible Azure Blob storage with Apache Hadoop in HDInsight](../hdinsight-hadoop-use-blob-storage.md). You can copy the file to the default container of your cluster.
5654

5755
In this tutorial, you use the Apache Hive console. For instructions on how to open the Hive console, see [Use Apache Ambari Hive View with Apache Hadoop in HDInsight](apache-hadoop-use-hive-ambari-view.md).
5856

@@ -78,7 +76,7 @@ SELECT CONCAT_WS(' ',COLLECT_LIST(textcol)) AS singlelineJSON
7876
SELECT * FROM StudentsOneLine
7977
```
8078

81-
The raw JSON file is located at **wasb://processjson\@hditutorialdata.blob.core.windows.net/**. The **StudentsRaw** Hive table points to the raw JSON document that is not flattened.
79+
The raw JSON file is located at `wasb://[email protected]/`. The **StudentsRaw** Hive table points to the raw JSON document that is not flattened.
8280

8381
The **StudentsOneLine** Hive table stores the data in the HDInsight default file system under the **/json/students/** path.
8482

@@ -88,7 +86,7 @@ The **SELECT** statement only returns one row.
8886

8987
Here is the output of the **SELECT** statement:
9088

91-
![Flattening the JSON document][image-hdi-hivejson-flatten]
89+
![Flattening the JSON document](./media/using-json-in-hive/flatten.png)
9290

9391
## Analyze JSON documents in Hive
9492
Hive provides three different mechanisms to run queries on JSON documents, or you can write your own:
@@ -112,7 +110,7 @@ FROM StudentsOneLine;
112110

113111
Here is the output when you run this query in the console window:
114112

115-
![get_json_object UDF][image-hdi-hivejson-getjsonobject]
113+
![get_json_object UDF](./media/using-json-in-hive/getjsonobject.png)
116114

117115
There are limitations of the get_json_object UDF:
118116

@@ -133,7 +131,7 @@ LATERAL VIEW JSON_TUPLE(jt.json_body, 'StudentId', 'Grade') q1
133131

134132
The output of this script in the Hive console:
135133

136-
![json_tuple UDF][image-hdi-hivejson-jsontuple]
134+
![json_tuple UDF](./media/using-json-in-hive/jsontuple.png)
137135

138136
The json_tuple UDF uses the [lateral view](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView) syntax in Hive, which enables json\_tuple to create a virtual table by applying the UDT function to each row of the original table. Complex JSONs become too unwieldy because of the repeated use of **LATERAL VIEW**. Furthermore, **JSON_TUPLE** cannot handle nested JSONs.
139137

@@ -150,18 +148,3 @@ For related articles, see:
150148
* [Use Apache Hive and HiveQL with Apache Hadoop in HDInsight to analyze a sample Apache log4j file](../hdinsight-use-hive.md)
151149
* [Analyze flight delay data by using Apache Hive in HDInsight](../hdinsight-analyze-flight-delay-data-linux.md)
152150
* [Analyze Twitter data by using Apache Hive in HDInsight](../hdinsight-analyze-twitter-data-linux.md)
153-
154-
[hdinsight-python]:python-udf-hdinsight.md
155-
156-
[image-hdi-hivejson-flatten]: ./media/using-json-in-hive/flatten.png
157-
[image-hdi-hivejson-getjsonobject]: ./media/using-json-in-hive/getjsonobject.png
158-
[image-hdi-hivejson-jsontuple]: ./media/using-json-in-hive/jsontuple.png
159-
[image-hdi-hivejson-jdk]: ./media/hdinsight-using-json-in-hive/jdk.png
160-
[image-hdi-hivejson-maven]: ./media/hdinsight-using-json-in-hive/maven.png
161-
[image-hdi-hivejson-serde]: ./media/hdinsight-using-json-in-hive/serde.png
162-
[image-hdi-hivejson-addjar]: ./media/hdinsight-using-json-in-hive/addjar.png
163-
[image-hdi-hivejson-serde_query1]: ./media/hdinsight-using-json-in-hive/serde_query1.png
164-
[image-hdi-hivejson-serde_query2]: ./media/hdinsight-using-json-in-hive/serde_query2.png
165-
[image-hdi-hivejson-serde_query3]: ./media/hdinsight-using-json-in-hive/serde_query3.png
166-
[image-hdi-hivejson-serde_result]: ./media/hdinsight-using-json-in-hive/serde_result.png
167-

0 commit comments

Comments
 (0)