Skip to content

Commit 9cefaf0

Browse files
authored
Merge pull request #88805 from dagiro/cats107
cats107
2 parents c541807 + e9b752e commit 9cefaf0

File tree

3 files changed

+11
-12
lines changed

3 files changed

+11
-12
lines changed

articles/hdinsight/hadoop/apache-hadoop-use-hive-visual-studio.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ Ad hoc queries can be executed in either **Batch** or **Interactive** mode.
111111
112112
6. From the toolbar, select the **HDInsight Cluster** that you want to use for this query. Select **Submit** to run the statements as a Hive job.
113113
114-
![Submit bar](./media/apache-hadoop-use-hive-visual-studio/hdinsight-toolbar-submit.png)
114+
![Azure HDInsight toolbar submit](./media/apache-hadoop-use-hive-visual-studio/hdinsight-toolbar-submit.png)
115115
116116
7. The **Hive Job Summary** appears and displays information about the running job. Use the **Refresh** link to refresh the job information, until the **Job Status** changes to **Completed**.
117117

articles/hdinsight/hadoop/apache-hadoop-using-apache-hive-as-an-etl-tool.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ ms.reviewer: jasonh
88
ms.custom: hdinsightactive
99
ms.topic: conceptual
1010
ms.date: 11/14/2017
11-
1211
---
12+
1313
# Use Apache Hive as an Extract, Transform, and Load (ETL) tool
1414

1515
You typically need to clean and transform incoming data before loading it into a destination suitable for analytics. Extract, Transform, and Load (ETL) operations are used to prepare data and load it into a data destination. Apache Hive on HDInsight can read in unstructured data, process the data as needed, and then load the data into a relational data warehouse for decision support systems. In this approach, data is extracted from the source and stored in scalable storage, such as Azure Storage blobs or Azure Data Lake Storage. The data is then transformed using a sequence of Hive queries and is finally staged within Hive in preparation for bulk loading into the destination data store.
@@ -18,7 +18,7 @@ You typically need to clean and transform incoming data before loading it into a
1818

1919
The following figure shows an overview of the use case and model for ETL automation. Input data is transformed to generate the appropriate output. During that transformation, the data can change shape, data type, and even language. ETL processes can convert Imperial to metric, change time zones, and improve precision to properly align with existing data in the destination. ETL processes can also combine new data with existing data to keep reporting up-to-date, or to provide further insight into existing data. Applications such as reporting tools and services can then consume this data in the desired format.
2020

21-
![Apache Hive as ETL](./media/apache-hadoop-using-apache-hive-as-an-etl-tool/hdinsight-etl-architecture.png)
21+
![Apache Hive as ETL architecture](./media/apache-hadoop-using-apache-hive-as-an-etl-tool/hdinsight-etl-architecture.png)
2222

2323
Hadoop is typically used in ETL processes that import either a massive number of text files (like CSVs) or a smaller but frequently changing number of text files, or both. Hive is a great tool to use to prepare the data before loading it into the data destination. Hive allows you to create a schema over the CSV and use a SQL-like language to generate MapReduce programs that interact with the data.
2424

articles/hdinsight/hadoop/apache-hadoop-visual-studio-tools-get-started.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ To connect to your Azure subscription:
7878

7979
4. From Server Explorer, a list of existing HDInsight clusters appears. If you don't have any clusters, you can create one by using the Azure portal, Azure PowerShell, or the HDInsight SDK. For more information, see [Create HDInsight clusters](../hdinsight-hadoop-provision-linux-clusters.md).
8080

81-
![Screenshot of the Data Lake Tools for Visual Studio cluster list in Server Explorer](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-server-explorer.png "Data Lake Tools for Visual Studio cluster list in Server Explorer")
81+
![Data Lake Tools for Visual Studio cluster list in Server Explorer](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-server-explorer.png "Data Lake Tools for Visual Studio cluster list in Server Explorer")
8282

8383
5. Expand an HDInsight cluster. **Hive Databases**, a default storage account, linked storage accounts, and **Hadoop Service Log** appear. You can further expand the entities.
8484

@@ -108,11 +108,11 @@ Right click on the linked cluster, select **Edit**, user could update the cluste
108108
## Explore linked resources
109109
From Server Explorer, you can see the default storage account and any linked storage accounts. If you expand the default storage account, you can see the containers on the storage account. The default storage account and the default container are marked. Right-click any of the containers to view the container contents.
110110

111-
![Screenshot of Data Lake Tools for Visual Studio list linked resources in Server Explorer](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-linked-resources.png "List linked resources")
111+
![Data Lake Tools for Visual Studio linked resources in Server Explorer](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-linked-resources.png "List linked resources")
112112

113113
After opening a container, you can use the following buttons to upload, delete, and download blobs:
114114

115-
![Screenshot of Data Lake Tools for Visual Studio blob operations in Server Explorer](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-blob-operations.png "Upload, delete, and download blobs in Server Explorer")
115+
![Data Lake Tools for Visual Studio blob operations in Server Explorer](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-blob-operations.png "Upload, delete, and download blobs in Server Explorer")
116116

117117
## Run interactive Apache Hive queries
118118
[Apache Hive](https://hive.apache.org) is a data warehouse infrastructure that's built on Hadoop. Hive is used for data summarization, queries, and analysis. You can use Data Lake Tools for Visual Studio to run Hive queries from Visual Studio. For more information about Hive, see [Use Apache Hive with HDInsight](hdinsight-use-hive.md).
@@ -196,7 +196,7 @@ To create, and run ad-hoc queries:
196196

197197
Ensure **Batch** is selected and then select **Submit**. If you select the advanced submit option, configure **Job Name**, **Arguments**, **Additional Configurations**, and **Status Directory** for the script.
198198

199-
![Screenshot of query and batch](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-query-batch.png)
199+
![Visual Studio query and batch options](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-query-batch.png)
200200

201201
![Screenshot of an HDInsight Hadoop Hive query](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-submit-jobs-advanced.png "Submit queries")
202202

@@ -219,15 +219,15 @@ To create and run a Hive solution:
219219

220220
The job summary varies slightly between **Batch** and **Interactive** mode.
221221

222-
![Job summary](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-job-summary.png "Hive job summary")
222+
![Apache Hive job summary tab display](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-job-summary.png "Hive job summary")
223223

224224
Use the **Refresh** button to update the status until the job status changes to **Finished**.
225225

226226
* For the job details from **Batch** mode, select the links at the bottom to see **Job Query**, **Job Output**, **Job log**, or **Yarn log**.
227227

228228
* For the job details from **Interactive** mode, see tabs **Output** and **HiveServer2 Output**.
229229

230-
![job details](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-job-details.png "Hive job details")
230+
![Visual Studio Apache Hive job details](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-job-details.png "Hive job details")
231231

232232
### View job graph
233233

@@ -237,15 +237,14 @@ To view all the operators inside the vertex, double-click on the vertices of the
237237

238238
The job graph may not appear even if Tez is specified as the execution engine if no Tez application is launched. This might happen because the job does not contain DML statements, or the DML statements can return without launching a Tez application. For example, `SELECT * FROM table1` will not launch the Tez application.
239239

240-
![Job graph](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-fast-path-hive-execution.png "Hive job summary")
241-
240+
![Visual Studio Apache Hive job graph](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-fast-path-hive-execution.png "Hive job summary")
242241

243242
### Task Execution Detail
244243

245244
From the job graph, you can select **Task Execution Detail** to get structured and visualized information for Hive jobs. You can also get more job details. If performance issues occur, you can use the view to get more details about the issue. For example, you can get information about how each task operates, and detailed information about each task
246245
(data read/write, schedule/start/end time, and so on). Use the information to tune job configurations or system architecture based on the visualized information.
247246

248-
![Screenshot of the Data Lake Visual Studio Tools Task Execution View window](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-task-execution-view.png "Task Execution View")
247+
![Data Lake Visual Studio Tools Task Execution View window](./media/apache-hadoop-visual-studio-tools-get-started/hdinsight-visual-studio-tools-task-execution-view.png "Task Execution View")
249248

250249

251250
### View Hive jobs

0 commit comments

Comments
 (0)