Skip to content

Commit 16922c5

Browse files
authored
Merge pull request #89099 from dagiro/cats154
cats154
2 parents a7793f4 + 3468e78 commit 16922c5

File tree

1 file changed

+59
-60
lines changed

1 file changed

+59
-60
lines changed

articles/hdinsight/spark/apache-azure-spark-history-server.md

Lines changed: 59 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22
title: Extended Spark History Server to debug Spark applications - Azure HDInsight
33
description: Use extended Spark History Server to debug and diagnose Spark applications - Azure HDInsight.
4-
ms.service: hdinsight
54
author: hrasheed-msft
65
ms.author: hrasheed
76
ms.reviewer: jasonh
7+
ms.service: hdinsight
88
ms.custom: hdinsightactive,hdiseo17may2017
99
ms.topic: conceptual
1010
ms.date: 09/04/2019
@@ -16,16 +16,17 @@ This article provides guidance on how to use extended Apache Spark History Serve
1616

1717
## Get access to Apache Spark History Server
1818

19-
Apache Spark History Server is the web UI for completed and running Spark applications.
19+
Apache Spark History Server is the web UI for completed and running Spark applications.
2020

2121
### Open the Apache Spark History Server Web UI from Azure portal
2222

2323
1. From the [Azure portal](https://portal.azure.com/), open the Spark cluster. For more information, see [List and show clusters](../hdinsight-administer-use-portal-linux.md#showClusters).
24-
2. From **Quick Links**, click **Cluster Dashboard**, and then click **Spark History Server**. When prompted, enter the admin credentials for the Spark cluster.
24+
2. From **Quick Links**, click **Cluster Dashboard**, and then click **Spark History Server**. When prompted, enter the admin credentials for the Spark cluster.
2525

26-
![Spark History Server](./media/apache-azure-spark-history-server/launch-history-server.png "Spark History Server")
26+
![portal launch Spark History Server](./media/apache-azure-spark-history-server/launch-history-server.png "Spark History Server")
2727

2828
### Open the Spark History Server Web UI by URL
29+
2930
Open the Spark History Server by browsing to the following URL, replace `<ClusterName>` with Spark cluster name of customer.
3031

3132
```
@@ -36,67 +37,67 @@ The Spark History Server web UI looks like:
3637

3738
![HDInsight Spark History Server](./media/apache-azure-spark-history-server/hdinsight-spark-history-server.png)
3839

39-
4040
## Data tab in Spark History Server
41+
4142
Select job ID then click **Data** on the tool menu to get the data view.
4243

4344
+ Check the **Inputs**, **Outputs**, and **Table Operations** by selecting the tabs separately.
4445

45-
![Data tabs](./media/apache-azure-spark-history-server/apache-spark-data-tabs.png)
46+
![Data for Spark application tabs](./media/apache-azure-spark-history-server/apache-spark-data-tabs.png)
4647

4748
+ Copy all rows by clicking button **Copy**.
4849

49-
![Data copy](./media/apache-azure-spark-history-server/apache-spark-data-copy.png)
50+
![Data for Spark application copy](./media/apache-azure-spark-history-server/apache-spark-data-copy.png)
5051

5152
+ Save all data as CSV file by clicking button **csv**.
5253

53-
![Data save](./media/apache-azure-spark-history-server/apache-spark-data-save.png)
54+
![Data for Spark application save](./media/apache-azure-spark-history-server/apache-spark-data-save.png)
5455

5556
+ Search by entering keywords in field **Search**, the search result will display immediately.
5657

57-
![Data search](./media/apache-azure-spark-history-server/apache-spark-data-search.png)
58+
![Data for Spark application search](./media/apache-azure-spark-history-server/apache-spark-data-search.png)
5859

5960
+ Click the column header to sort table, click the plus sign to expand a row to show more details, or click the minus sign to collapse a row.
6061

61-
![Data table](./media/apache-azure-spark-history-server/apache-spark-data-table.png)
62+
![Data for Spark application table](./media/apache-azure-spark-history-server/apache-spark-data-table.png)
6263

6364
+ Download single file by clicking button **Partial Download** that place at the right, then the selected file will be downloaded to local, if the file does not exist any more, it will open a new tab to show the error messages.
6465

65-
![Data download row](./media/apache-azure-spark-history-server/sparkui-data-download-row.png)
66+
![Data for Spark application download row](./media/apache-azure-spark-history-server/sparkui-data-download-row.png)
6667

6768
+ Copy full path or relative path by selecting the **Copy Full Path**, **Copy Relative Path** that expands from download menu. For azure data lake storage files, **Open in Azure Storage Explorer** will launch Azure Storage Explorer, and locate to the folder when sign-in.
6869

69-
![Data copy path](./media/apache-azure-spark-history-server/sparkui-data-copy-path.png)
70+
![Data for Spark application copy path](./media/apache-azure-spark-history-server/sparkui-data-copy-path.png)
7071

71-
+ Click the number below the table to navigate pages when too many rows to display in one page.
72+
+ Click the number below the table to navigate pages when too many rows to display in one page.
7273

73-
![Data page](./media/apache-azure-spark-history-server/apache-spark-data-page.png)
74+
![Data for Spark application page](./media/apache-azure-spark-history-server/apache-spark-data-page.png)
7475

7576
+ Hover on the question mark beside Data to show the tooltip, or click the question mark to get more information.
7677

77-
![Data more info](./media/apache-azure-spark-history-server/sparkui-data-more-info.png)
78+
![Data for Spark application more info](./media/apache-azure-spark-history-server/sparkui-data-more-info.png)
7879

7980
+ Send feedback with issues by clicking **Provide us feedback**.
8081

81-
![graph feedback](./media/apache-azure-spark-history-server/sparkui-graph-feedback.png)
82-
82+
![Spark graph provide us feedback again](./media/apache-azure-spark-history-server/sparkui-graph-feedback.png)
8383

8484
## Graph tab in Apache Spark History Server
85+
8586
Select job ID then click **Graph** on the tool menu to get the job graph view.
8687

87-
+ Check overview of your job by the generated job graph.
88+
+ Check overview of your job by the generated job graph.
8889

8990
+ By default, it will show all jobs, and it could be filtered by **Job ID**.
9091

91-
![graph job ID](./media/apache-azure-spark-history-server/apache-spark-graph-jobid.png)
92+
![Spark application and job graph job ID](./media/apache-azure-spark-history-server/apache-spark-graph-jobid.png)
9293

9394
+ By default, **Progress** is selected, user could check the data flow by selecting **Read/Written** in the dropdown list of **Display**.
9495

95-
![graph display](./media/apache-azure-spark-history-server/sparkui-graph-display.png)
96+
![Spark application and job graph display](./media/apache-azure-spark-history-server/sparkui-graph-display.png)
9697

9798
The graph node display in color that shows the heatmap.
9899

99-
![graph heatmap](./media/apache-azure-spark-history-server/sparkui-graph-heatmap.png)
100+
![Spark application and job graph heatmap](./media/apache-azure-spark-history-server/sparkui-graph-heatmap.png)
100101

101102
+ Play back the job by clicking the **Playback** button and stop anytime by clicking the stop button. The task display in color to show different status when playback:
102103

@@ -106,30 +107,29 @@ Select job ID then click **Graph** on the tool menu to get the job graph view.
106107
+ White for waiting or skipped: The task is waiting to run, or the stage has skipped.
107108
+ Red for failed: The task has failed.
108109

109-
![graph color sample, running](./media/apache-azure-spark-history-server/sparkui-graph-color-running.png)
110-
110+
![Spark application and job graph color sample, running](./media/apache-azure-spark-history-server/sparkui-graph-color-running.png)
111+
111112
The skipped stage display in white.
112-
![graph color sample, skip](./media/apache-azure-spark-history-server/sparkui-graph-color-skip.png)
113+
![Spark application and job graph color sample, skip](./media/apache-azure-spark-history-server/sparkui-graph-color-skip.png)
114+
115+
![Spark application and job graph color sample, failed](./media/apache-azure-spark-history-server/sparkui-graph-color-failed.png)
113116

114-
![graph color sample, failed](./media/apache-azure-spark-history-server/sparkui-graph-color-failed.png)
115-
116117
> [!NOTE]
117118
> Playback for each job is allowed. For incomplete job, playback is not supported.
118119
119-
120120
+ Mouse scrolls to zoom in/out the job graph, or click **Zoom to fit** to make it fit to screen.
121-
122-
![graph zoom to fit](./media/apache-azure-spark-history-server/sparkui-graph-zoom2fit.png)
121+
122+
![Spark application and job graph zoom to fit](./media/apache-azure-spark-history-server/sparkui-graph-zoom2fit.png)
123123

124124
+ Hover on graph node to see the tooltip when there are failed tasks, and click on stage to open stage page.
125125

126-
![graph tooltip](./media/apache-azure-spark-history-server/sparkui-graph-tooltip.png)
126+
![Spark application and job graph tooltip](./media/apache-azure-spark-history-server/sparkui-graph-tooltip.png)
127127

128128
+ In job graph tab, stages will have tooltip and small icon displayed if they have tasks meet the below conditions:
129129
+ Data skew: data read size > average data read size of all tasks inside this stage * 2 and data read size > 10 MB.
130130
+ Time skew: execution time > average execution time of all tasks inside this stage * 2 and execution time > 2 mins.
131131

132-
![graph skew icon](./media/apache-azure-spark-history-server/sparkui-graph-skew-icon.png)
132+
![Spark application and job graph skew icon](./media/apache-azure-spark-history-server/sparkui-graph-skew-icon.png)
133133

134134
+ The job graph node will display the following information of each stage:
135135
+ ID.
@@ -149,49 +149,51 @@ Select job ID then click **Graph** on the tool menu to get the job graph view.
149149
150150
+ Send feedback with issues by clicking **Provide us feedback**.
151151

152-
![graph feedback](./media/apache-azure-spark-history-server/sparkui-graph-feedback.png)
153-
152+
![Spark application and job graph feedback](./media/apache-azure-spark-history-server/sparkui-graph-feedback.png)
154153

155154
## Diagnosis tab in Apache Spark History Server
155+
156156
Select job ID then click **Diagnosis** on the tool menu to get the job Diagnosis view. The diagnosis tab includes **Data Skew**, **Time Skew**, and **Executor Usage Analysis**.
157-
157+
158158
+ Check the **Data Skew**, **Time Skew**, and **Executor Usage Analysis** by selecting the tabs respectively.
159159

160-
![Diagnosis tabs](./media/apache-azure-spark-history-server/sparkui-diagnosis-tabs.png)
160+
![SparkUI diagnosis data skew tab again](./media/apache-azure-spark-history-server/sparkui-diagnosis-tabs.png)
161161

162162
### Data Skew
163-
Click **Data Skew** tab, the corresponding skewed tasks are displayed based on the specified parameters.
163+
164+
Click **Data Skew** tab, the corresponding skewed tasks are displayed based on the specified parameters.
164165

165166
+ **Specify Parameters** - The first section displays the parameters which are used to detect Data Skew. The built-in rule is: Task Data Read is greater than 3 times of the average task data read, and the task data read is more than 10MB. If you want to define your own rule for skewed tasks, you can choose your parameters, the **Skewed Stage**, and **Skew Char** section will be refreshed accordingly.
166167

167168
+ **Skewed Stage** - The second section displays stages which have skewed tasks meeting the criteria specified above. If there are more than one skewed task in a stage, the skewed stage table only displays the most skewed task (e.g. the largest data for data skew).
168169

169-
![Data skew section2](./media/apache-azure-spark-history-server/sparkui-diagnosis-dataskew-section2.png)
170+
![sparkui diagnosis data skew tab](./media/apache-azure-spark-history-server/sparkui-diagnosis-dataskew-section2.png)
170171

171172
+ **Skew Chart** – When a row in the skew stage table is selected, the skew chart displays more task distributions details based on data read and execution time. The skewed tasks are marked in red and the normal tasks are marked in blue. For performance consideration, the chart only displays up to 100 sample tasks. The task details are displayed in right bottom panel.
172173

173-
![Data skew section3](./media/apache-azure-spark-history-server/sparkui-diagnosis-dataskew-section3.png)
174+
![sparkui skew chart for stage 10](./media/apache-azure-spark-history-server/sparkui-diagnosis-dataskew-section3.png)
174175

175176
### Time Skew
176-
The **Time Skew** tab displays skewed tasks based on task execution time.
177+
178+
The **Time Skew** tab displays skewed tasks based on task execution time.
177179

178180
+ **Specify Parameters** - The first section displays the parameters which are used to detect Time Skew. The default criteria to detect time skew is: task execution time is greater than 3 times of average execution time and task execution time is greater than 30 seconds. You can change the parameters based on your needs. The **Skewed Stage** and **Skew Chart** display the corresponding stages and tasks information just like the **Data Skew** tab above.
179181

180182
+ Click **Time Skew**, then filtered result is displayed in **Skewed Stage** section according to the parameters set in section **Specify Parameters**. Click one item in **Skewed Stage** section, then the corresponding chart is drafted in section3, and the task details are displayed in right bottom panel.
181183

182-
![Time skew section2](./media/apache-azure-spark-history-server/sparkui-diagnosis-timeskew-section2.png)
184+
![sparkui diagnosis time skew section](./media/apache-azure-spark-history-server/sparkui-diagnosis-timeskew-section2.png)
183185

184186
### Executor Usage Analysis
187+
185188
The Executor Usage Graph visualizes the Spark job actual executor allocation and running status.
186189

187190
+ Click **Executor Usage Analysis**, then four types curves about executor usage are drafted, including **Allocated Executors**, **Running Executors**,**idle Executors**, and **Max Executor Instances**. Regarding allocated executors, each "Executor added" or "Executor removed" event will increase or decrease the allocated executors, you can check "Event Timeline" in the “Jobs" tab for more comparison.
188191

189-
![Executors tab](./media/apache-azure-spark-history-server/sparkui-diagnosis-executors.png)
192+
![sparkui diagnosis executors tab](./media/apache-azure-spark-history-server/sparkui-diagnosis-executors.png)
190193

191194
+ Click the color icon to select or unselect the corresponding content in all drafts.
192195

193-
![Select chart](./media/apache-azure-spark-history-server/sparkui-diagnosis-select-chart.png)
194-
196+
![sparkui diagnosis select chart](./media/apache-azure-spark-history-server/sparkui-diagnosis-select-chart.png)
195197

196198
## FAQ
197199

@@ -206,33 +208,32 @@ To revert to community version, do the following steps:
206208
5. The property sets to **false** now.
207209
6. Click **Save** to save the configuration.
208210

209-
![feature turns off](./media/apache-azure-spark-history-server/apache-spark-turn-off.png)
211+
![Apache Ambari feature turns off](./media/apache-azure-spark-history-server/apache-spark-turn-off.png)
210212

211213
7. Click **Spark2** in left panel, under **Summary** tab, click **Spark2 History Server**.
212214

213-
![restart server1](./media/apache-azure-spark-history-server/apache-spark-restart1.png)
215+
![Apache Ambari Spark2 Summary view](./media/apache-azure-spark-history-server/apache-spark-restart1.png)
214216

215217
8. Restart history server by clicking **Restart** of **Spark2 History Server**.
216218

217-
![restart server2](./media/apache-azure-spark-history-server/apache-spark-restart2.png)
218-
219+
![Apache Ambari Spark2 History restart](./media/apache-azure-spark-history-server/apache-spark-restart2.png)
219220
9. Refresh the Spark history server web UI, it will be reverted to community version.
220221

221222
### 2. Upload history server event
222223

223224
If you run into history server error, follow the steps to provide the event:
225+
224226
1. Download event by clicking **Download** in history server web UI.
225227

226-
![download event](./media/apache-azure-spark-history-server/sparkui-download-event.png)
228+
![Spark2 History Server download](./media/apache-azure-spark-history-server/sparkui-download-event.png)
227229

228230
2. Click **Provide us feedback** from data/graph tab.
229231

230-
![graph feedback](./media/apache-azure-spark-history-server/sparkui-graph-feedback.png)
232+
![Spark graph provide us feedback](./media/apache-azure-spark-history-server/sparkui-graph-feedback.png)
231233

232234
3. Provide the title and description of error, drag the zip file to the edit field, then click **Submit new issue**.
233235

234-
![file issue](./media/apache-azure-spark-history-server/apache-spark-file-issue.png)
235-
236+
![apache spark file issue example](./media/apache-azure-spark-history-server/apache-spark-file-issue.png)
236237

237238
### 3. Upgrade jar file for hotfix scenario
238239

@@ -285,45 +286,43 @@ If you want to upgrade with hotfix, use the script below which will upgrade spar
285286
fi
286287
```
287288

288-
**Usage**:
289+
**Usage**:
289290

290291
`upgrade_spark_enhancement.sh https://${jar_path}`
291292

292293
**Example**:
293294

294-
`upgrade_spark_enhancement.sh https://${account_name}.blob.core.windows.net/packages/jars/spark-enhancement-${version}.jar`
295+
`upgrade_spark_enhancement.sh https://${account_name}.blob.core.windows.net/packages/jars/spark-enhancement-${version}.jar`
295296

296297
**To use the bash file from Azure portal**
297298

298299
1. Launch [Azure portal](https://ms.portal.azure.com), and select your cluster.
299300
2. Click **Script actions**, then **Submit new**. Complete the **Submit script action** form, then click **Create** button.
300-
301+
301302
+ **Script type**: select **Custom**.
302303
+ **Name**: specify a script name.
303304
+ **Bash script URI**: upload the bash file to private cluster then copy URL here. Alternatively, use the URI provided.
304-
305+
305306
```upgrade_spark_enhancement
306307
https://hdinsighttoolingstorage.blob.core.windows.net/shsscriptactions/upgrade_spark_enhancement.sh
307308
```
308309

309310
+ Check on **Head** and **Worker**.
310311
+ **Parameters**: set the parameters follow the bash usage.
311312

312-
![upload log or upgrade hotfix](./media/apache-azure-spark-history-server/apache-spark-upload1.png)
313-
313+
![Azure portal submit script action](./media/apache-azure-spark-history-server/apache-spark-upload1.png)
314314

315315
## Known issues
316316

317-
1. Currently, it only works for Spark 2.3 and 2.4 cluster.
317+
1. Currently, it only works for Spark 2.3 and 2.4 cluster.
318318

319-
2. Input/output data using RDD will not show in data tab.
319+
2. Input/output data using RDD will not show in data tab.
320320

321321
## Next steps
322322

323323
* [Manage resources for an Apache Spark cluster on HDInsight](apache-spark-resource-manager.md)
324324
* [Configure Apache Spark settings](apache-spark-settings.md)
325325

326-
327326
## Contact us
328327

329328
If you have any feedback, or if you encounter any other problems when using this tool, send an email at ([[email protected]](mailto:[email protected])).

0 commit comments

Comments
 (0)