You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/using-json-in-hive.md
+11-28Lines changed: 11 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,17 @@
1
1
---
2
-
title: Analyze and process JSON documents with Apache Hive - Azure HDInsight
3
-
description: Learn how to use JSON documents and analyze them by using Apache Hive in Azure HDInsight
2
+
title: Analyze and process JSON documents with Apache Hive in Azure HDInsight
3
+
description: Learn how to use JSON documents and analyze them by using Apache Hive in Azure HDInsight.
4
4
author: hrasheed-msft
5
+
ms.author: hrasheed
5
6
ms.reviewer: jasonh
6
-
7
7
ms.service: hdinsight
8
-
ms.custom: hdinsightactive
9
8
ms.topic: conceptual
10
-
ms.date: 02/27/2019
11
-
ms.author: hrasheed
12
-
9
+
ms.date: 06/03/2019
13
10
---
11
+
14
12
# Process and analyze JSON documents by using Apache Hive in Azure HDInsight
15
13
16
-
Learn how to process and analyze JavaScript Object Notation (JSON) files by using Apache Hive in Azure HDInsight. This tutorial uses the following JSON document:
14
+
Learn how to process and analyze JavaScript Object Notation (JSON) files by using Apache Hive in Azure HDInsight. This article uses the following JSON document:
17
15
18
16
```json
19
17
{
@@ -52,7 +50,7 @@ Learn how to process and analyze JavaScript Object Notation (JSON) files by usin
52
50
}
53
51
```
54
52
55
-
The file can be found at **wasb://processjson\@hditutorialdata.blob.core.windows.net/**. For more information on how to use Azure Blob storage with HDInsight, see [Use HDFS-compatible Azure Blob storage with Apache Hadoop in HDInsight](../hdinsight-hadoop-use-blob-storage.md). You can copy the file to the default container of your cluster.
53
+
The file can be found at `wasb://[email protected]/`. For more information on how to use Azure Blob storage with HDInsight, see [Use HDFS-compatible Azure Blob storage with Apache Hadoop in HDInsight](../hdinsight-hadoop-use-blob-storage.md). You can copy the file to the default container of your cluster.
56
54
57
55
In this tutorial, you use the Apache Hive console. For instructions on how to open the Hive console, see [Use Apache Ambari Hive View with Apache Hadoop in HDInsight](apache-hadoop-use-hive-ambari-view.md).
58
56
@@ -78,7 +76,7 @@ SELECT CONCAT_WS(' ',COLLECT_LIST(textcol)) AS singlelineJSON
78
76
SELECT*FROM StudentsOneLine
79
77
```
80
78
81
-
The raw JSON file is located at **wasb://processjson\@hditutorialdata.blob.core.windows.net/**. The **StudentsRaw** Hive table points to the raw JSON document that is not flattened.
79
+
The raw JSON file is located at `wasb://[email protected]/`. The **StudentsRaw** Hive table points to the raw JSON document that is not flattened.
82
80
83
81
The **StudentsOneLine** Hive table stores the data in the HDInsight default file system under the **/json/students/** path.
84
82
@@ -88,7 +86,7 @@ The **SELECT** statement only returns one row.
88
86
89
87
Here is the output of the **SELECT** statement:
90
88
91
-
![Flattening the JSON document][image-hdi-hivejson-flatten]
89
+

92
90
93
91
## Analyze JSON documents in Hive
94
92
Hive provides three different mechanisms to run queries on JSON documents, or you can write your own:
@@ -112,7 +110,7 @@ FROM StudentsOneLine;
112
110
113
111
Here is the output when you run this query in the console window:
The json_tuple UDF uses the [lateral view](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView) syntax in Hive, which enables json\_tuple to create a virtual table by applying the UDT function to each row of the original table. Complex JSONs become too unwieldy because of the repeated use of **LATERAL VIEW**. Furthermore, **JSON_TUPLE** cannot handle nested JSONs.
139
137
@@ -150,18 +148,3 @@ For related articles, see:
150
148
*[Use Apache Hive and HiveQL with Apache Hadoop in HDInsight to analyze a sample Apache log4j file](../hdinsight-use-hive.md)
151
149
*[Analyze flight delay data by using Apache Hive in HDInsight](../hdinsight-analyze-flight-delay-data-linux.md)
152
150
*[Analyze Twitter data by using Apache Hive in HDInsight](../hdinsight-analyze-twitter-data-linux.md)
0 commit comments