Skip to content

Commit 56b4356

Browse files
Updated the *Run Apache Hive queries* article to remove the co-admin prereq.
Also refreshed 3 out of the 4 screenshots and made a few other minor changes.
1 parent 52cae07 commit 56b4356

File tree

4 files changed

+18
-20
lines changed

4 files changed

+18
-20
lines changed

articles/hdinsight/hadoop/apache-hadoop-use-hive-visual-studio.md

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.custom: hdinsightactive
99
ms.topic: conceptual
10-
ms.date: 10/18/2019
10+
ms.date: 11/06/2019
1111
ms.author: hrasheed
1212
---
1313

@@ -19,8 +19,6 @@ Learn how to use the Data Lake tools for Visual Studio to query Apache Hive. The
1919

2020
* An Apache Hadoop cluster on HDInsight. For information about creating this item, see [Create Apache Hadoop cluster in Azure HDInsight using Resource Manager template](./apache-hadoop-linux-tutorial-get-started.md).
2121

22-
* Co-administrator access to your Azure subscription. To change administrators for a subscription, see [Add or change Azure subscription administrators](../../billing/billing-add-change-azure-subscription-administrator.md).
23-
2422
* [Visual Studio](https://visualstudio.microsoft.com/vs/). The steps in this article use Visual Studio 2019.
2523

2624
* HDInsight tools for Visual Studio or Azure Data Lake tools for Visual Studio. For information on installing and configuring the tools, see [Install Data Lake Tools for Visual Studio](apache-hadoop-visual-studio-tools-get-started.md#install-data-lake-tools-for-visual-studio).
@@ -62,7 +60,7 @@ Ad-hoc queries can be executed in either **Batch** or **Interactive** mode.
6260
6361
8. If you selected the advanced submit option, configure **Job Name**, **Arguments**, **Additional Configurations**, and **Status Directory** in the **Submit Script** dialog box. Then select **Submit**.
6462
65-
![Submit Script dialog box, HDInsight Hadoop Hive query](./media/apache-hadoop-use-hive-visual-studio/vs-tools-submit-jobs-advanced.png "Submit queries")
63+
![Submit Script dialog box, HDInsight Hadoop Hive query](./media/apache-hadoop-use-hive-visual-studio/vs-tools-submit-jobs-advanced.png)
6664
6765
### Create a Hive application
6866
@@ -89,22 +87,22 @@ To run a Hive query by creating a Hive application, follow these steps:
8987
9088
These statements do the following actions:
9189
92-
* `DROP TABLE`: Deletes the table if it exists.
93-
94-
* `CREATE EXTERNAL TABLE`: Creates a new 'external' table in Hive. External tables only store the table definition in Hive. (The data is left in the original location.)
90+
* `DROP TABLE`: Deletes the table if it exists.
9591
96-
> [!NOTE]
97-
> External tables should be used when you expect the underlying data to be updated by an external source, such as a MapReduce job or an Azure service.
98-
>
99-
> Dropping an external table does **not** delete the data, only the table definition.
92+
* `CREATE EXTERNAL TABLE`: Creates a new 'external' table in Hive. External tables only store the table definition in Hive. (The data is left in the original location.)
10093
101-
* `ROW FORMAT`: Tells Hive how the data is formatted. In this case, the fields in each log are separated by a space.
94+
> [!NOTE]
95+
> External tables should be used when you expect the underlying data to be updated by an external source, such as a MapReduce job or an Azure service.
96+
>
97+
> Dropping an external table does **not** delete the data, only the table definition.
98+
99+
* `ROW FORMAT`: Tells Hive how the data is formatted. In this case, the fields in each log are separated by a space.
102100
103-
* `STORED AS TEXTFILE LOCATION`: Tells Hive that the data is stored in the example/data directory, and that it's stored as text.
101+
* `STORED AS TEXTFILE LOCATION`: Tells Hive that the data is stored in the *example/data* directory, and that it's stored as text.
104102
105-
* `SELECT`: Selects a count of all rows where column `t4` contains the value `[ERROR]`. This statement returns a value of `3`, because there are three rows that contain this value.
103+
* `SELECT`: Selects a count of all rows where column `t4` contains the value `[ERROR]`. This statement returns a value of `3`, because three rows contain this value.
106104
107-
* `INPUT__FILE__NAME LIKE '%.log'`: Tells Hive to only return data from files ending in .log. This clause restricts the search to the *sample.log* file that contains the data.
105+
* `INPUT__FILE__NAME LIKE '%.log'`: Tells Hive to only return data from files ending in .log. This clause restricts the search to the *sample.log* file that contains the data.
108106
109107
6. From the query file toolbar (which has a similar appearance to the ad-hoc query toolbar), select the HDInsight cluster that you want to use for this query. Then change **Interactive** to **Batch** (if necessary) and select **Submit** to run the statements as a Hive job.
110108
@@ -131,17 +129,17 @@ The following example relies on the `log4jLogs` table created in the previous pr
131129
These statements do the following actions:
132130
133131
* `CREATE TABLE IF NOT EXISTS`: Creates a table if it doesn't already exist. Because the `EXTERNAL` keyword isn't used, this statement creates an internal table. Internal tables are stored in the Hive data warehouse and are managed by Hive.
134-
135-
> [!NOTE]
136-
> Unlike `EXTERNAL` tables, dropping an internal table also deletes the underlying data.
137132
138-
* `STORED AS ORC`: Stores the data in optimized row columnar (ORC) format. ORC is a highly optimized and efficient format for storing Hive data.
133+
> [!NOTE]
134+
> Unlike `EXTERNAL` tables, dropping an internal table also deletes the underlying data.
135+
136+
* `STORED AS ORC`: Stores the data in *optimized row columnar* (ORC) format. ORC is a highly optimized and efficient format for storing Hive data.
139137
140138
* `INSERT OVERWRITE ... SELECT`: Selects rows from the `log4jLogs` table that contain `[ERROR]`, then inserts the data into the `errorLogs` table.
141139
142140
3. Change **Interactive** to **Batch** if necessary, then select **Submit**.
143141
144-
4. To verify that the job created the table, use **Server Explorer** and expand **Azure** > **HDInsight** > your HDInsight cluster > **Hive Databases** > **default**. The **errorLogs** table and the **log4jLogs** table are listed.
142+
4. To verify that the job created the table, go to **Server Explorer** and expand **Azure** > **HDInsight**. Expand your HDInsight cluster, and then expand **Hive Databases** > **default**. The **errorLogs** table and the **log4jLogs** table are listed.
145143
146144
## Next steps
147145
1.77 KB
Loading
6.77 KB
Loading
7.59 KB
Loading

0 commit comments

Comments
 (0)