Skip to content

Commit aa6a93b

Browse files
authored
Merge pull request #97417 from dagiro/freshness84
freshness84
2 parents 970abe2 + 18a59c9 commit aa6a93b

File tree

1 file changed

+17
-16
lines changed

1 file changed

+17
-16
lines changed

articles/hdinsight/spark/apache-spark-job-debugging.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,37 +5,38 @@ author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 12/05/2018
9+
ms.custom: hdinsightactive
10+
ms.date: 11/29/2019
1111
---
1212

1313
# Debug Apache Spark jobs running on Azure HDInsight
1414

1515
In this article, you learn how to track and debug [Apache Spark](https://spark.apache.org/) jobs running on HDInsight clusters using the [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) UI, Spark UI, and the Spark History Server. You start a Spark job using a notebook available with the Spark cluster, **Machine learning: Predictive analysis on food inspection data using MLLib**. You can use the following steps to track an application that you submitted using any other approach as well, for example, **spark-submit**.
1616

17-
## Prerequisites
17+
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
1818

19-
You must have the following:
19+
## Prerequisites
2020

21-
* An Azure subscription. See [Get Azure free trial](https://azure.microsoft.com/documentation/videos/get-azure-free-trial-for-testing-hadoop-in-hdinsight/).
2221
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
22+
2323
* You should have started running the notebook, **[Machine learning: Predictive analysis on food inspection data using MLLib](apache-spark-machine-learning-mllib-ipython.md)**. For instructions on how to run this notebook, follow the link.
2424

2525
## Track an application in the YARN UI
2626

27-
1. Launch the YARN UI. Click **Yarn** under **Cluster dashboards**.
27+
1. Launch the YARN UI. Select **Yarn** under **Cluster dashboards**.
2828

2929
![Azure portal launch YARN UI](./media/apache-spark-job-debugging/launch-apache-yarn-ui.png)
3030

3131
> [!TIP]
32-
> Alternatively, you can also launch the YARN UI from the Ambari UI. To launch the Ambari UI, click **Ambari home** under **Cluster dashboards**. From the Ambari UI, click **YARN**, click **Quick Links**, click the active Resource Manager, and then click **Resource Manager UI**.
32+
> Alternatively, you can also launch the YARN UI from the Ambari UI. To launch the Ambari UI, select **Ambari home** under **Cluster dashboards**. From the Ambari UI, navigate to **YARN** > **Quick Links** > the active Resource Manager > **Resource Manager UI**.
3333
34-
2. Because you started the Spark job using Jupyter notebooks, the application has the name **remotesparkmagics** (this is the name for all applications that are started from the notebooks). Click the application ID against the application name to get more information about the job. This launches the application view.
34+
2. Because you started the Spark job using Jupyter notebooks, the application has the name **remotesparkmagics** (this is the name for all applications that are started from the notebooks). Select the application ID against the application name to get more information about the job. This launches the application view.
3535

3636
![Spark history server Find Spark application ID](./media/apache-spark-job-debugging/find-application-id1.png)
3737

3838
For such applications that are launched from the Jupyter notebooks, the status is always **RUNNING** until you exit the notebook.
39+
3940
3. From the application view, you can drill down further to find out the containers associated with the application and the logs (stdout/stderr). You can also launch the Spark UI by clicking the linking corresponding to the **Tracking URL**, as shown below.
4041

4142
![Spark history server download container logs](./media/apache-spark-job-debugging/download-container-logs.png)
@@ -44,15 +45,15 @@ You must have the following:
4445

4546
In the Spark UI, you can drill down into the Spark jobs that are spawned by the application you started earlier.
4647

47-
1. To launch the Spark UI, from the application view, click the link against the **Tracking URL**, as shown in the screen capture above. You can see all the Spark jobs that are launched by the application running in the Jupyter notebook.
48+
1. To launch the Spark UI, from the application view, select the link against the **Tracking URL**, as shown in the screen capture above. You can see all the Spark jobs that are launched by the application running in the Jupyter notebook.
4849

4950
![Spark history server jobs tab](./media/apache-spark-job-debugging/view-apache-spark-jobs.png)
5051

51-
2. Click the **Executors** tab to see processing and storage information for each executor. You can also retrieve the call stack by clicking on the **Thread Dump** link.
52+
2. Select the **Executors** tab to see processing and storage information for each executor. You can also retrieve the call stack by selecting the **Thread Dump** link.
5253

5354
![Spark history server executors tab](./media/apache-spark-job-debugging/view-spark-executors.png)
5455

55-
3. Click the **Stages** tab to see the stages associated with the application.
56+
3. Select the **Stages** tab to see the stages associated with the application.
5657

5758
![Spark history server stages tab](./media/apache-spark-job-debugging/view-apache-spark-stages.png "View Spark stages")
5859

@@ -77,8 +78,8 @@ In the Spark UI, you can drill down into the Spark jobs that are spawned by the
7778
7879
6. Other tabs in the Spark UI provide useful information about the Spark instance as well.
7980

80-
* Storage tab - If your application creates an RDDs, you can find information about those in the Storage tab.
81-
* Environment tab - This tab provides a lot of useful information about your Spark instance such as the:
81+
* Storage tab - If your application creates an RDD, you can find information about those in the Storage tab.
82+
* Environment tab - This tab provides useful information about your Spark instance such as the:
8283
* Scala version
8384
* Event log directory associated with the cluster
8485
* Number of executor cores for the application
@@ -88,14 +89,14 @@ In the Spark UI, you can drill down into the Spark jobs that are spawned by the
8889

8990
Once a job is completed, the information about the job is persisted in the Spark History Server.
9091

91-
1. To launch the Spark History Server, from the Overview blade, click **Spark history server** under **Cluster dashboards**.
92+
1. To launch the Spark History Server, from the **Overview** page, select **Spark history server** under **Cluster dashboards**.
9293

9394
![Azure portal launch Spark history server](./media/apache-spark-job-debugging/launch-spark-history-server.png "Launch Spark History Server1")
9495

9596
> [!TIP]
96-
> Alternatively, you can also launch the Spark History Server UI from the Ambari UI. To launch the Ambari UI, from the Overview blade, click **Ambari home** under **Cluster dashboards**. From the Ambari UI, click **Spark**, click **Quick Links**, and then click **Spark History Server UI**.
97+
> Alternatively, you can also launch the Spark History Server UI from the Ambari UI. To launch the Ambari UI, from the Overview blade, select **Ambari home** under **Cluster dashboards**. From the Ambari UI, navigate to **Spark2** > **Quick Links** > **Spark2 History Server UI**.
9798
98-
2. You see all the completed applications listed. Click an application ID to drill down into an application for more info.
99+
2. You see all the completed applications listed. Select an application ID to drill down into an application for more info.
99100

100101
![Spark history server completed applications](./media/apache-spark-job-debugging/view-completed-applications.png "Launch Spark History Server2")
101102

0 commit comments

Comments
 (0)