Skip to content

Commit 03d6bcc

Browse files
authored
Merge pull request #112516 from dagiro/freshness_c34
freshness_c34
2 parents bcc53d2 + 6cde838 commit 03d6bcc

File tree

1 file changed

+6
-25
lines changed

1 file changed

+6
-25
lines changed

articles/hdinsight/spark/apache-spark-job-debugging.md

Lines changed: 6 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive
10-
ms.date: 11/29/2019
10+
ms.date: 04/23/2020
1111
---
1212

1313
# Debug Apache Spark jobs running on Azure HDInsight
1414

15-
In this article, you learn how to track and debug [Apache Spark](https://spark.apache.org/) jobs running on HDInsight clusters using the [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) UI, Spark UI, and the Spark History Server. You start a Spark job using a notebook available with the Spark cluster, **Machine learning: Predictive analysis on food inspection data using MLLib**. You can use the following steps to track an application that you submitted using any other approach as well, for example, **spark-submit**.
15+
In this article, you learn how to track and debug Apache Spark jobs running on HDInsight clusters. Debug using the Apache Hadoop YARN UI, Spark UI, and the Spark History Server. You start a Spark job using a notebook available with the Spark cluster, **Machine learning: Predictive analysis on food inspection data using MLLib**. Use the following steps to track an application that you submitted using any other approach as well, for example, **spark-submit**.
1616

1717
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
1818

@@ -31,7 +31,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
3131
> [!TIP]
3232
> Alternatively, you can also launch the YARN UI from the Ambari UI. To launch the Ambari UI, select **Ambari home** under **Cluster dashboards**. From the Ambari UI, navigate to **YARN** > **Quick Links** > the active Resource Manager > **Resource Manager UI**.
3333
34-
2. Because you started the Spark job using Jupyter notebooks, the application has the name **remotesparkmagics** (this is the name for all applications that are started from the notebooks). Select the application ID against the application name to get more information about the job. This launches the application view.
34+
2. Because you started the Spark job using Jupyter notebooks, the application has the name **remotesparkmagics** (the name for all applications started from the notebooks). Select the application ID against the application name to get more information about the job. This action launches the application view.
3535

3636
![Spark history server Find Spark application ID](./media/apache-spark-job-debugging/find-application-id1.png)
3737

@@ -71,19 +71,18 @@ In the Spark UI, you can drill down into the Spark jobs that are spawned by the
7171

7272
![View Spark stages event timeline](./media/apache-spark-job-debugging/view-spark-stages-event-timeline.png)
7373

74-
This displays the Spark events in the form of a timeline. The timeline view is available at three levels, across jobs, within a job, and within a stage. The image above captures the timeline view for a given stage.
74+
This image displays the Spark events in the form of a timeline. The timeline view is available at three levels, across jobs, within a job, and within a stage. The image above captures the timeline view for a given stage.
7575

7676
> [!TIP]
7777
> If you select the **Enable zooming** check box, you can scroll left and right across the timeline view.
7878
7979
6. Other tabs in the Spark UI provide useful information about the Spark instance as well.
8080

81-
* Storage tab - If your application creates an RDD, you can find information about those in the Storage tab.
81+
* Storage tab - If your application creates an RDD, you can find information in the Storage tab.
8282
* Environment tab - This tab provides useful information about your Spark instance such as the:
8383
* Scala version
8484
* Event log directory associated with the cluster
8585
* Number of executor cores for the application
86-
* Etc.
8786

8887
## Find information about completed jobs using the Spark History Server
8988

@@ -104,22 +103,4 @@ Once a job is completed, the information about the job is persisted in the Spark
104103

105104
* [Manage resources for the Apache Spark cluster in Azure HDInsight](apache-spark-resource-manager.md)
106105
* [Debug Apache Spark Jobs using extended Spark History Server](apache-azure-spark-history-server.md)
107-
108-
### For data analysts
109-
110-
* [Apache Spark with Machine Learning: Use Spark in HDInsight for analyzing building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
111-
* [Apache Spark with Machine Learning: Use Spark in HDInsight to predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
112-
* [Website log analysis using Apache Spark in HDInsight](apache-spark-custom-library-website-log-analysis.md)
113-
* [Application Insight telemetry data analysis using Apache Spark in HDInsight](apache-spark-analyze-application-insight-logs.md)
114-
115-
116-
### For Spark developers
117-
118-
* [Create a standalone application using Scala](apache-spark-create-standalone-application.md)
119-
* [Run jobs remotely on an Apache Spark cluster using Apache Livy](apache-spark-livy-rest-interface.md)
120-
* [Use HDInsight Tools Plugin for IntelliJ IDEA to create and submit Spark Scala applications](apache-spark-intellij-tool-plugin.md)
121-
* [Use HDInsight Tools Plugin for IntelliJ IDEA to debug Apache Spark applications remotely](apache-spark-intellij-tool-plugin-debug-jobs-remotely.md)
122-
* [Use Apache Zeppelin notebooks with an Apache Spark cluster on HDInsight](apache-spark-zeppelin-notebook.md)
123-
* [Kernels available for Jupyter notebook in Apache Spark cluster for HDInsight](apache-spark-jupyter-notebook-kernels.md)
124-
* [Use external packages with Jupyter notebooks](apache-spark-jupyter-notebook-use-external-packages.md)
125-
* [Install Jupyter on your computer and connect to an HDInsight Spark cluster](apache-spark-jupyter-notebook-install-locally.md)
106+
* [Debug Apache Spark applications with Azure Toolkit for IntelliJ through SSH](apache-spark-intellij-tool-debug-remotely-through-ssh.md)

0 commit comments

Comments
 (0)