You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/apache-hadoop-use-hive-visual-studio.md
+18-20Lines changed: 18 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.custom: hdinsightactive
9
9
ms.topic: conceptual
10
-
ms.date: 10/18/2019
10
+
ms.date: 11/06/2019
11
11
ms.author: hrasheed
12
12
---
13
13
@@ -19,8 +19,6 @@ Learn how to use the Data Lake tools for Visual Studio to query Apache Hive. The
19
19
20
20
* An Apache Hadoop cluster on HDInsight. For information about creating this item, see [Create Apache Hadoop cluster in Azure HDInsight using Resource Manager template](./apache-hadoop-linux-tutorial-get-started.md).
21
21
22
-
* Co-administrator access to your Azure subscription. To change administrators for a subscription, see [Add or change Azure subscription administrators](../../billing/billing-add-change-azure-subscription-administrator.md).
23
-
24
22
*[Visual Studio](https://visualstudio.microsoft.com/vs/). The steps in this article use Visual Studio 2019.
25
23
26
24
* HDInsight tools for Visual Studio or Azure Data Lake tools for Visual Studio. For information on installing and configuring the tools, see [Install Data Lake Tools for Visual Studio](apache-hadoop-visual-studio-tools-get-started.md#install-data-lake-tools-for-visual-studio).
@@ -62,7 +60,7 @@ Ad-hoc queries can be executed in either **Batch** or **Interactive** mode.
62
60
63
61
8. If you selected the advanced submit option, configure **Job Name**, **Arguments**, **Additional Configurations**, and **Status Directory** in the **Submit Script** dialog box. Then select **Submit**.
@@ -89,22 +87,22 @@ To run a Hive query by creating a Hive application, follow these steps:
89
87
90
88
These statements do the following actions:
91
89
92
-
* `DROP TABLE`: Deletes the table if it exists.
93
-
94
-
* `CREATE EXTERNAL TABLE`: Creates a new 'external' table in Hive. External tables only store the table definition in Hive. (The data is left in the original location.)
90
+
* `DROP TABLE`: Deletes the table if it exists.
95
91
96
-
> [!NOTE]
97
-
> External tables should be used when you expect the underlying data to be updated by an external source, such as a MapReduce job or an Azure service.
98
-
>
99
-
> Dropping an external table does **not** delete the data, only the table definition.
92
+
* `CREATE EXTERNAL TABLE`: Creates a new 'external' table in Hive. External tables only store the table definition in Hive. (The data is left in the original location.)
100
93
101
-
* `ROW FORMAT`: Tells Hive how the data is formatted. In this case, the fields in each log are separated by a space.
94
+
> [!NOTE]
95
+
> External tables should be used when you expect the underlying data to be updated by an external source, such as a MapReduce job or an Azure service.
96
+
>
97
+
> Dropping an external table does **not** delete the data, only the table definition.
98
+
99
+
* `ROW FORMAT`: Tells Hive how the data is formatted. In this case, the fields in each log are separated by a space.
102
100
103
-
* `STORED AS TEXTFILE LOCATION`: Tells Hive that the data is stored in the example/data directory, and that it's stored as text.
101
+
* `STORED AS TEXTFILE LOCATION`: Tells Hive that the data is stored in the *example/data* directory, and that it's stored as text.
104
102
105
-
* `SELECT`: Selects a count of all rows where column `t4` contains the value `[ERROR]`. This statement returns a value of `3`, because there are three rows that contain this value.
103
+
* `SELECT`: Selects a count of all rows where column `t4` contains the value `[ERROR]`. This statement returns a value of `3`, because three rows contain this value.
106
104
107
-
* `INPUT__FILE__NAME LIKE '%.log'`: Tells Hive to only return data from files ending in .log. This clause restricts the search to the *sample.log* file that contains the data.
105
+
* `INPUT__FILE__NAME LIKE '%.log'`: Tells Hive to only return data from files ending in .log. This clause restricts the search to the *sample.log* file that contains the data.
108
106
109
107
6. From the query file toolbar (which has a similar appearance to the ad-hoc query toolbar), select the HDInsight cluster that you want to use for this query. Then change **Interactive** to **Batch** (if necessary) and select **Submit** to run the statements as a Hive job.
110
108
@@ -131,17 +129,17 @@ The following example relies on the `log4jLogs` table created in the previous pr
131
129
These statements do the following actions:
132
130
133
131
* `CREATE TABLE IF NOT EXISTS`: Creates a table if it doesn't already exist. Because the `EXTERNAL` keyword isn't used, this statement creates an internal table. Internal tables are stored in the Hive data warehouse and are managed by Hive.
134
-
135
-
> [!NOTE]
136
-
> Unlike `EXTERNAL` tables, dropping an internal table also deletes the underlying data.
137
132
138
-
* `STORED AS ORC`: Stores the data in optimized row columnar (ORC) format. ORC is a highly optimized and efficient format for storing Hive data.
133
+
> [!NOTE]
134
+
> Unlike `EXTERNAL` tables, dropping an internal table also deletes the underlying data.
135
+
136
+
* `STORED AS ORC`: Stores the data in *optimized row columnar* (ORC) format. ORC is a highly optimized and efficient format for storing Hive data.
139
137
140
138
* `INSERT OVERWRITE ... SELECT`: Selects rows from the `log4jLogs` table that contain `[ERROR]`, then inserts the data into the `errorLogs` table.
141
139
142
140
3. Change **Interactive** to **Batch** if necessary, then select **Submit**.
143
141
144
-
4. To verify that the job created the table, use **Server Explorer** and expand **Azure** > **HDInsight** > your HDInsight cluster > **Hive Databases** > **default**. The **errorLogs** table and the **log4jLogs** table are listed.
142
+
4. To verify that the job created the table, go to **Server Explorer** and expand **Azure** > **HDInsight**. Expand your HDInsight cluster, and then expand **Hive Databases** > **default**. The **errorLogs** table and the **log4jLogs** table are listed.
0 commit comments