Skip to content

Commit 3421b3f

Browse files
authored
Merge pull request #97122 from dagiro/freshness71
freshness71
2 parents 262e6bf + 4828670 commit 3421b3f

File tree

3 files changed

+58
-64
lines changed

3 files changed

+58
-64
lines changed

articles/hdinsight/spark/apache-azure-spark-history-server.md

Lines changed: 58 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive,hdiseo17may2017
98
ms.topic: conceptual
10-
ms.date: 09/04/2019
9+
ms.custom: hdinsightactive,hdiseo17may2017
10+
ms.date: 11/25/2019
1111
---
1212

1313
# Use extended Apache Spark History Server to debug and diagnose Apache Spark applications
1414

15-
This article provides guidance on how to use extended Apache Spark History Server to debug and diagnose completed and running Spark applications. The extension includes data tab and graph tab and diagnosis tab. On the **Data** tab, users can check the input and output data of the Spark job. On the **Graph** tab, users can check the data flow and replay the job graph. On the **Diagnosis** tab, user can refer to **Data Skew**, **Time Skew** and **Executor Usage Analysis**.
15+
This article provides guidance on how to use extended Apache Spark History Server to debug and diagnose completed and running Spark applications. The extension includes data tab and graph tab and diagnosis tab. On the **Data** tab, users can check the input and output data of the Spark job. On the **Graph** tab, users can check the data flow and replay the job graph. On the **Diagnosis** tab, user can refer to **Data Skew**, **Time Skew**, and **Executor Usage Analysis**.
1616

1717
## Get access to Apache Spark History Server
1818

@@ -21,59 +21,55 @@ Apache Spark History Server is the web UI for completed and running Spark applic
2121
### Open the Apache Spark History Server Web UI from Azure portal
2222

2323
1. From the [Azure portal](https://portal.azure.com/), open the Spark cluster. For more information, see [List and show clusters](../hdinsight-administer-use-portal-linux.md#showClusters).
24-
2. From **Quick Links**, click **Cluster Dashboard**, and then click **Spark History Server**. When prompted, enter the admin credentials for the Spark cluster.
24+
2. From **Cluster dashboards**, select **Spark history server**. When prompted, enter the admin credentials for the Spark cluster.
2525

26-
![portal launch Spark History Server](./media/apache-azure-spark-history-server/launch-history-server.png "Spark History Server")
26+
![portal launch Spark History Server](./media/apache-azure-spark-history-server/azure-portal-dashboard-spark-history.png "Spark History Server")
2727

2828
### Open the Spark History Server Web UI by URL
2929

30-
Open the Spark History Server by browsing to the following URL, replace `<ClusterName>` with Spark cluster name of customer.
31-
32-
```
33-
https://<ClusterName>.azurehdinsight.net/sparkhistory
34-
```
30+
Open the Spark History Server by browsing to `https://CLUSTERNAME.azurehdinsight.net/sparkhistory` where CLUSTERNAME is the name of your Spark cluster.
3531

36-
The Spark History Server web UI looks like:
32+
The Spark History Server web UI may look similar to:
3733

3834
![HDInsight Spark History Server](./media/apache-azure-spark-history-server/hdinsight-spark-history-server.png)
3935

4036
## Data tab in Spark History Server
4137

42-
Select job ID then click **Data** on the tool menu to get the data view.
38+
Select job ID then select **Data** on the tool menu to get the data view.
4339

44-
+ Check the **Inputs**, **Outputs**, and **Table Operations** by selecting the tabs separately.
40+
+ Review the **Inputs**, **Outputs**, and **Table Operations** by selecting the tabs separately.
4541

4642
![Data for Spark application tabs](./media/apache-azure-spark-history-server/apache-spark-data-tabs.png)
4743

48-
+ Copy all rows by clicking button **Copy**.
44+
+ Copy all rows by selecting button **Copy**.
4945

5046
![Data for Spark application copy](./media/apache-azure-spark-history-server/apache-spark-data-copy.png)
5147

52-
+ Save all data as CSV file by clicking button **csv**.
48+
+ Save all data as CSV file by selecting button **csv**.
5349

5450
![Data for Spark application save](./media/apache-azure-spark-history-server/apache-spark-data-save.png)
5551

5652
+ Search by entering keywords in field **Search**, the search result will display immediately.
5753

5854
![Data for Spark application search](./media/apache-azure-spark-history-server/apache-spark-data-search.png)
5955

60-
+ Click the column header to sort table, click the plus sign to expand a row to show more details, or click the minus sign to collapse a row.
56+
+ Select the column header to sort table, select the plus sign to expand a row to show more details, or select the minus sign to collapse a row.
6157

6258
![Data for Spark application table](./media/apache-azure-spark-history-server/apache-spark-data-table.png)
6359

64-
+ Download single file by clicking button **Partial Download** that place at the right, then the selected file will be downloaded to local, if the file does not exist any more, it will open a new tab to show the error messages.
60+
+ Download single file by selecting button **Partial Download** that place at the right, then the selected file will be downloaded to local, if the file doesn't exist anymore, it will open a new tab to show the error messages.
6561

6662
![Data for Spark application download row](./media/apache-azure-spark-history-server/sparkui-data-download-row.png)
6763

68-
+ Copy full path or relative path by selecting the **Copy Full Path**, **Copy Relative Path** that expands from download menu. For azure data lake storage files, **Open in Azure Storage Explorer** will launch Azure Storage Explorer, and locate to the folder when sign-in.
64+
+ Copy full path or relative path by selecting the **Copy Full Path**, **Copy Relative Path** that expands from download menu. For azure data lake storage files, **Open in Azure Storage Explorer** will launch Azure Storage Explorer, and locate to the folder when sign in.
6965

7066
![Data for Spark application copy path](./media/apache-azure-spark-history-server/sparkui-data-copy-path.png)
7167

72-
+ Click the number below the table to navigate pages when too many rows to display in one page.
68+
+ Select the number below the table to navigate pages when too many rows to display in one page.
7369

7470
![Data for Spark application page](./media/apache-azure-spark-history-server/apache-spark-data-page.png)
7571

76-
+ Hover on the question mark beside Data to show the tooltip, or click the question mark to get more information.
72+
+ Hover on the question mark beside Data to show the tooltip, or select the question mark to get more information.
7773

7874
![Data for Spark application more info](./media/apache-azure-spark-history-server/sparkui-data-more-info.png)
7975

@@ -85,7 +81,7 @@ Select job ID then click **Data** on the tool menu to get the data view.
8581

8682
Select job ID then click **Graph** on the tool menu to get the job graph view.
8783

88-
+ Check overview of your job by the generated job graph.
84+
+ Review overview of your job by the generated job graph.
8985

9086
+ By default, it will show all jobs, and it could be filtered by **Job ID**.
9187

@@ -99,13 +95,15 @@ Select job ID then click **Graph** on the tool menu to get the job graph view.
9995

10096
![Spark application and job graph heatmap](./media/apache-azure-spark-history-server/sparkui-graph-heatmap.png)
10197

102-
+ Play back the job by clicking the **Playback** button and stop anytime by clicking the stop button. The task display in color to show different status when playback:
98+
+ Play back the job by selecting the **Playback** button and stop anytime by selecting the stop button. The task display in color to show different status when playback:
10399

104-
+ Green for succeeded: The job has completed successfully.
105-
+ Orange for retried: Instances of tasks that failed but do not affect the final result of the job. These tasks had duplicate or retry instances that may succeed later.
106-
+ Blue for running: The task is running.
107-
+ White for waiting or skipped: The task is waiting to run, or the stage has skipped.
108-
+ Red for failed: The task has failed.
100+
|Color |Description |
101+
|---|---|
102+
|Green|The job has completed successfully.|
103+
|Orange|Instances of tasks that failed but don't affect the final result of the job. These tasks had duplicate or retry instances that may succeed later.|
104+
|Blue|The task is running.|
105+
|White|The task is waiting to run, or the stage has skipped.|
106+
|Red|The task has failed.|
109107

110108
![Spark application and job graph color sample, running](./media/apache-azure-spark-history-server/sparkui-graph-color-running.png)
111109

@@ -147,25 +145,25 @@ Select job ID then click **Graph** on the tool menu to get the job graph view.
147145
> [!NOTE]
148146
> For data size of read and write we use 1MB = 1000 KB = 1000 * 1000 Bytes.
149147
150-
+ Send feedback with issues by clicking **Provide us feedback**.
148+
+ Send feedback with issues by selecting **Provide us feedback**.
151149

152150
![Spark application and job graph feedback](./media/apache-azure-spark-history-server/sparkui-graph-feedback.png)
153151

154152
## Diagnosis tab in Apache Spark History Server
155153

156-
Select job ID then click **Diagnosis** on the tool menu to get the job Diagnosis view. The diagnosis tab includes **Data Skew**, **Time Skew**, and **Executor Usage Analysis**.
154+
Select job ID then select **Diagnosis** on the tool menu to get the job Diagnosis view. The diagnosis tab includes **Data Skew**, **Time Skew**, and **Executor Usage Analysis**.
157155

158-
+ Check the **Data Skew**, **Time Skew**, and **Executor Usage Analysis** by selecting the tabs respectively.
156+
+ Review **Data Skew**, **Time Skew**, and **Executor Usage Analysis** by selecting the tabs respectively.
159157

160158
![SparkUI diagnosis data skew tab again](./media/apache-azure-spark-history-server/sparkui-diagnosis-tabs.png)
161159

162160
### Data Skew
163161

164-
Click **Data Skew** tab, the corresponding skewed tasks are displayed based on the specified parameters.
162+
Select **Data Skew** tab, the corresponding skewed tasks are displayed based on the specified parameters.
165163

166-
+ **Specify Parameters** - The first section displays the parameters which are used to detect Data Skew. The built-in rule is: Task Data Read is greater than 3 times of the average task data read, and the task data read is more than 10MB. If you want to define your own rule for skewed tasks, you can choose your parameters, the **Skewed Stage**, and **Skew Char** section will be refreshed accordingly.
164+
+ **Specify Parameters** - The first section displays the parameters, which are used to detect Data Skew. The built-in rule is: Task Data Read is greater than three times of the average task data read, and the task data read is more than 10 MB. If you want to define your own rule for skewed tasks, you can choose your parameters, the **Skewed Stage**, and **Skew Char** section will be refreshed accordingly.
167165

168-
+ **Skewed Stage** - The second section displays stages which have skewed tasks meeting the criteria specified above. If there are more than one skewed task in a stage, the skewed stage table only displays the most skewed task (e.g. the largest data for data skew).
166+
+ **Skewed Stage** - The second section displays stages, which have skewed tasks meeting the criteria specified above. If there's more than one skewed task in a stage, the skewed stage table only displays the most skewed task (e.g. the largest data for data skew).
169167

170168
![sparkui diagnosis data skew tab](./media/apache-azure-spark-history-server/sparkui-diagnosis-dataskew-section2.png)
171169

@@ -177,21 +175,21 @@ Click **Data Skew** tab, the corresponding skewed tasks are displayed based on t
177175

178176
The **Time Skew** tab displays skewed tasks based on task execution time.
179177

180-
+ **Specify Parameters** - The first section displays the parameters which are used to detect Time Skew. The default criteria to detect time skew is: task execution time is greater than 3 times of average execution time and task execution time is greater than 30 seconds. You can change the parameters based on your needs. The **Skewed Stage** and **Skew Chart** display the corresponding stages and tasks information just like the **Data Skew** tab above.
178+
+ **Specify Parameters** - The first section displays the parameters, which are used to detect Time Skew. The default criteria to detect time skew is: task execution time is greater than three times of average execution time and task execution time is greater than 30 seconds. You can change the parameters based on your needs. The **Skewed Stage** and **Skew Chart** display the corresponding stages and tasks information just like the **Data Skew** tab above.
181179

182-
+ Click **Time Skew**, then filtered result is displayed in **Skewed Stage** section according to the parameters set in section **Specify Parameters**. Click one item in **Skewed Stage** section, then the corresponding chart is drafted in section3, and the task details are displayed in right bottom panel.
180+
+ Select **Time Skew**, then filtered result is displayed in **Skewed Stage** section according to the parameters set in section **Specify Parameters**. Select one item in **Skewed Stage** section, then the corresponding chart is drafted in section3, and the task details are displayed in right bottom panel.
183181

184182
![sparkui diagnosis time skew section](./media/apache-azure-spark-history-server/sparkui-diagnosis-timeskew-section2.png)
185183

186184
### Executor Usage Analysis
187185

188186
The Executor Usage Graph visualizes the Spark job actual executor allocation and running status.
189187

190-
+ Click **Executor Usage Analysis**, then four types curves about executor usage are drafted, including **Allocated Executors**, **Running Executors**,**idle Executors**, and **Max Executor Instances**. Regarding allocated executors, each "Executor added" or "Executor removed" event will increase or decrease the allocated executors, you can check "Event Timeline" in the “Jobs" tab for more comparison.
188+
+ Select **Executor Usage Analysis**, then four types curves about executor usage are drafted, including **Allocated Executors**, **Running Executors**, **idle Executors**, and **Max Executor Instances**. Regarding allocated executors, each "Executor added" or "Executor removed" event will increase or decrease the allocated executors, you can check "Event Timeline" in the “Jobs" tab for more comparison.
191189

192190
![sparkui diagnosis executors tab](./media/apache-azure-spark-history-server/sparkui-diagnosis-executors.png)
193191

194-
+ Click the color icon to select or unselect the corresponding content in all drafts.
192+
+ Select the color icon to select or unselect the corresponding content in all drafts.
195193

196194
![sparkui diagnosis select chart](./media/apache-azure-spark-history-server/sparkui-diagnosis-select-chart.png)
197195

@@ -201,33 +199,32 @@ The Executor Usage Graph visualizes the Spark job actual executor allocation and
201199

202200
To revert to community version, do the following steps:
203201

204-
1. Open cluster in Ambari. Click **Spark2** in left panel.
205-
2. Click **Configs** tab.
206-
3. Expand the group **Custom spark2-defaults**.
207-
4. Click **Add Property**, add **spark.ui.enhancement.enabled=false**, save.
208-
5. The property sets to **false** now.
209-
6. Click **Save** to save the configuration.
202+
1. Open cluster in Ambari.
203+
1. Navigate to **Spark2** > **Configs** > **Custom spark2-defaults**.
204+
1. Select **Add Property ...**, add **spark.ui.enhancement.enabled=false**, save.
205+
1. The property sets to **false** now.
206+
1. Select **Save** to save the configuration.
210207

211208
![Apache Ambari feature turns off](./media/apache-azure-spark-history-server/apache-spark-turn-off.png)
212209

213-
7. Click **Spark2** in left panel, under **Summary** tab, click **Spark2 History Server**.
210+
1. Select **Spark2** in left panel, under **Summary** tab, select **Spark2 History Server**.
214211

215212
![Apache Ambari Spark2 Summary view](./media/apache-azure-spark-history-server/apache-spark-restart1.png)
216213

217-
8. Restart history server by clicking **Restart** of **Spark2 History Server**.
214+
1. Restart history server by selecting **Restart** of **Spark2 History Server**.
218215

219216
![Apache Ambari Spark2 History restart](./media/apache-azure-spark-history-server/apache-spark-restart2.png)
220-
9. Refresh the Spark history server web UI, it will be reverted to community version.
217+
1. Refresh the Spark history server web UI, it will be reverted to community version.
221218

222219
### 2. Upload history server event
223220

224221
If you run into history server error, follow the steps to provide the event:
225222

226-
1. Download event by clicking **Download** in history server web UI.
223+
1. Download event by selecting **Download** in history server web UI.
227224

228225
![Spark2 History Server download](./media/apache-azure-spark-history-server/sparkui-download-event.png)
229226

230-
2. Click **Provide us feedback** from data/graph tab.
227+
2. Select **Provide us feedback** from data/graph tab.
231228

232229
![Spark graph provide us feedback](./media/apache-azure-spark-history-server/sparkui-graph-feedback.png)
233230

@@ -297,32 +294,29 @@ If you want to upgrade with hotfix, use the script below which will upgrade spar
297294
**To use the bash file from Azure portal**
298295

299296
1. Launch [Azure portal](https://ms.portal.azure.com), and select your cluster.
300-
2. Click **Script actions**, then **Submit new**. Complete the **Submit script action** form, then click **Create** button.
301-
302-
+ **Script type**: select **Custom**.
303-
+ **Name**: specify a script name.
304-
+ **Bash script URI**: upload the bash file to private cluster then copy URL here. Alternatively, use the URI provided.
305-
306-
```upgrade_spark_enhancement
307-
https://hdinsighttoolingstorage.blob.core.windows.net/shsscriptactions/upgrade_spark_enhancement.sh
308-
```
297+
2. Complete a [script action](../hdinsight-hadoop-customize-cluster-linux.md) with the following parameters:
309298

310-
+ Check on **Head** and **Worker**.
311-
+ **Parameters**: set the parameters follow the bash usage.
299+
|Property |Value |
300+
|---|---|
301+
|Script type|- Custom|
302+
|Name|UpgradeJar|
303+
|Bash script URI|`https://hdinsighttoolingstorage.blob.core.windows.net/shsscriptactions/upgrade_spark_enhancement.sh`|
304+
|Node type(s)|Head, Worker|
305+
|Parameters|`https://${account_name}.blob.core.windows.net/packages/jars/spark-enhancement-${version}.jar`|
312306

313307
![Azure portal submit script action](./media/apache-azure-spark-history-server/apache-spark-upload1.png)
314308

315309
## Known issues
316310

317-
1. Currently, it only works for Spark 2.3 and 2.4 cluster.
311+
+ Currently, it only works for Spark 2.3 and 2.4 cluster.
318312

319-
2. Input/output data using RDD will not show in data tab.
313+
+ Input/output data using RDD won't show in data tab.
320314

321315
## Next steps
322316

323-
* [Manage resources for an Apache Spark cluster on HDInsight](apache-spark-resource-manager.md)
324-
* [Configure Apache Spark settings](apache-spark-settings.md)
317+
+ [Manage resources for an Apache Spark cluster on HDInsight](apache-spark-resource-manager.md)
318+
+ [Configure Apache Spark settings](apache-spark-settings.md)
325319

326320
## Contact us
327321

328-
If you have any feedback, or if you encounter any other problems when using this tool, send an email at ([[email protected]](mailto:[email protected])).
322+
If you have any feedback, or come across any issues when using this tool, send an email at ([[email protected]](mailto:[email protected])).
175 KB
Loading

0 commit comments

Comments
 (0)