You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Failure spark job debugging with Azure Toolkit for IntelliJ (preview)
14
15
15
-
This article provides step-by-step guidance on how to use HDInsight Tools in [Azure Toolkit for IntelliJ](https://docs.microsoft.com/java/azure/intellij/azure-toolkit-for-intellij?view=azure-java-stable) to run **Spark Failure Debug** applications.
16
+
This article provides step-by-step guidance on how to use HDInsight Tools in [Azure Toolkit for IntelliJ](https://docs.microsoft.com/java/azure/intellij/azure-toolkit-for-intellij?view=azure-java-stable) to run **Spark Failure Debug** applications.
16
17
17
18
## Prerequisites
19
+
18
20
*[Oracle Java Development kit](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html). This tutorial uses Java version 8.0.202.
19
21
20
22
* IntelliJ IDEA. This article uses [IntelliJ IDEA Community ver. 2019.1.3](https://www.jetbrains.com/idea/download/#section=windows).
@@ -25,23 +27,23 @@ This article provides step-by-step guidance on how to use HDInsight Tools in [Az
25
27
26
28
* Microsoft Azure Storage Explorer. See [Download Microsoft Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/).
27
29
28
-
## Create a project with debugging template
30
+
## Create a project with debugging template
29
31
30
32
Create a spark2.3.2 project to continue failure debug, take failure task debugging sample file in this document.
31
33
32
34
1. Open IntelliJ IDEA. Open the **New Project** window.
33
35
34
-
a. Select **Azure Spark/HDInsight** from the left pane.
36
+
a. Select **Azure Spark/HDInsight** from the left pane.
35
37
36
38
b. Select **Spark Project with Failure Task Debugging Sample(Preview)(Scala)** from the main window.
37
39
38
-

40
+

41
+
42
+
c. Select **Next**.
39
43
40
-
c. Select **Next**.
41
-
42
44
2. In the **New Project** window, do the following steps:
43
45
44
-

46
+

45
47
46
48
a. Enter a project name and project location.
47
49
@@ -59,72 +61,77 @@ Create a spark Scala/Java application, then run the application on a Spark cl
59
61
60
62
1. Click **Add Configuration** to open **Run/Debug Configurations** window.
2. In the **Run/Debug Configurations** dialog box, select the plus sign (**+**). Then select the **Apache Spark on HDInsight** option.
65
67
66
-

68
+

67
69
68
70
3. Switch to **Remotely Run in Cluster** tab. Enter information for **Name**, **Spark cluster**, and **Main class name**. Our tools support debug with **Executors**. The **numExectors**, the default value is 5, and you'd better not set higher than 3. To reduce the run time, you can add **spark.yarn.maxAppAttempts** into **job Configurations** and set the value to 1. Click **OK** button to save the configuration.

71
73
72
-
4. The configuration is now saved with the name you provided. To view the configuration details, select the configuration name. To make changes, select **Edit Configurations**.
74
+
4. The configuration is now saved with the name you provided. To view the configuration details, select the configuration name. To make changes, select **Edit Configurations**.
73
75
74
76
5. After you complete the configurations settings, you can run the project against the remote cluster.
75
-
76
-

77
+
78
+

77
79
78
80
6. You can check the application ID from the output window.
79
-
80
-

81
+
82
+

81
83
82
84
## Download failed job profile
83
85
84
86
If the job submission fails, you could download the failed job profile to the local machine for further debugging.
85
87
86
88
1. Open **Microsoft Azure Storage Explorer**, locate the HDInsight account of the cluster for the failed job, download the failed job resources from the corresponding location: **\hdp\spark2-events\\.spark-failures\\\<application ID>** to a local folder. The **activities** window will show the download progress.

91
93
92
94
## Configure local debugging environment and debug on failure
93
95
94
96
1. Open the original project or create a new project and associate it with the original source code. Only spark2.3.2 version is supported for failure debugging currently.
95
97
96
-
2. In IntelliJ IDEA, create a **Spark Failure Debug** config file, select the FTD file from the previously downloaded failed job resources for the **Spark Job Failure Context location** field.
97
-
98
+
1. In IntelliJ IDEA, create a **Spark Failure Debug** config file, select the FTD file from the previously downloaded failed job resources for the **Spark Job Failure Context location** field.
* Remote debug (video): [Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely on an HDInsight cluster](https://channel9.msdn.com/Series/AzureDataLake/Debug-HDInsight-Spark-Applications-with-Azure-Toolkit-for-IntelliJ)
116
120
117
121
### Scenarios
122
+
118
123
*[Apache Spark with BI: Do interactive data analysis by using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
119
124
*[Apache Spark with Machine Learning: Use Spark in HDInsight to analyze building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
120
125
*[Apache Spark with Machine Learning: Use Spark in HDInsight to predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
121
126
*[Website log analysis using Apache Spark in HDInsight](../hdinsight-apache-spark-custom-library-website-log-analysis.md)
122
127
123
128
### Create and run applications
129
+
124
130
*[Create a standalone application using Scala](../hdinsight-apache-spark-create-standalone-application.md)
125
131
*[Run jobs remotely on an Apache Spark cluster using Apache Livy](apache-spark-livy-rest-interface.md)
126
132
127
133
### Tools and extensions
134
+
128
135
*[Use Azure Toolkit for IntelliJ to create Apache Spark applications for an HDInsight cluster](apache-spark-intellij-tool-plugin.md)
129
136
*[Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely through VPN](apache-spark-intellij-tool-plugin-debug-jobs-remotely.md)
130
137
*[Use HDInsight Tools for IntelliJ with Hortonworks Sandbox](../hadoop/hdinsight-tools-for-intellij-with-hortonworks-sandbox.md)
@@ -135,5 +142,6 @@ Create a spark Scala/Java application, then run the application on a Spark cl
135
142
*[Install Jupyter on your computer and connect to an HDInsight Spark cluster](apache-spark-jupyter-notebook-install-locally.md)
136
143
137
144
### Manage resources
145
+
138
146
*[Manage resources for the Apache Spark cluster in Azure HDInsight](apache-spark-resource-manager.md)
139
147
*[Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)
0 commit comments