You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/monitoring/apache-spark-advisor.md
+27-11Lines changed: 27 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: Spark Advisor
2
+
title: Apache Spark Advisor in Azure Synapse Analytics
3
3
description: Spark Advisor is a system to automatically analyze commands/queries, and show the appropriate advice when a customer executes code or query.
4
4
services: synapse-analytics
5
5
author: jejiang
@@ -11,24 +11,23 @@ ms.subservice: spark
11
11
ms.date: 06/23/2022
12
12
---
13
13
14
-
# Spark Advisor
14
+
# Apache Spark Advisor in Azure Synapse Analytics
15
15
16
-
Spark Advisor is a system to automatically analyze commands/queries, and show the appropriate advice when customer executes code or query. After applying the advice, you would have chance to improve your execution performance, decrease cost and fix the execution failures.
16
+
The Apache Spark advisor analyzes commands and code run by Spark and displays real-time advice for Notebook runs. The Spark advisor has built-in patterns to help users avoid common mistakes, offer recommendations for code optimization, perform error analysis, and locate the root cause of failures.
17
17
18
+
## Built-in advices
18
19
19
-
20
-
## May return inconsistent results when using 'randomSplit'
20
+
### May return inconsistent results when using 'randomSplit'
21
21
Inconsistent or inaccurate results may be returned when working with the results of the 'randomSplit' method. Use Apache Spark (RDD) caching before using the 'randomSplit' method.
22
22
23
23
Method randomSplit() is equivalent to performing sample() on your data frame multiple times, with each sample refetching, partitioning, and sorting your data frame within partitions. The data distribution across partitions and sorting order is important for both randomSplit() and sample(). If either changes upon data refetch, there may be duplicates, or missing values across splits and the same sample using the same seed may produce different results.
24
24
25
25
These inconsistencies may not happen on every run, but to eliminate them completely, cache your data frame, repartition on a column(s), or apply aggregate functions such as groupBy.
26
26
27
-
## Table/view name is already in use
27
+
###Table/view name is already in use
28
28
A view already exists with the same name as the created table, or a table already exists with the same name as the created view.
29
29
When this name is used in queries or applications, only the view will be returned no matter which one created first. To avoid conflicts, rename either the table or the view.
30
30
31
-
## Hints related advise
32
31
### Unable to recognize a hint
33
32
The selected query contains a hint that isn't recognized. Verify that the hint is spelled correctly.
34
33
@@ -50,23 +49,40 @@ The selected query contains a hint that prevents another hint from being applied
50
49
spark.sql("SELECT /*+ BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.str = t2.str")
51
50
```
52
51
53
-
## Enable 'spark.advise.divisionExprConvertRule.enable' to reduce rounding error propagation
52
+
###Enable 'spark.advise.divisionExprConvertRule.enable' to reduce rounding error propagation
54
53
This query contains the expression with Double type. We recommend that you enable the configuration 'spark.advise.divisionExprConvertRule.enable', which can help reduce the division expressions and to reduce the rounding error propagation.
55
54
56
55
```text
57
56
"t.a/t.b/t.c" convert into "t.a/(t.b * t.c)"
58
57
```
59
58
60
-
## Enable 'spark.advise.nonEqJoinConvertRule.enable' to improve query performance
59
+
###Enable 'spark.advise.nonEqJoinConvertRule.enable' to improve query performance
61
60
This query contains time consuming join due to "Or" condition within query. We recommend that you enable the configuration 'spark.advise.nonEqJoinConvertRule.enable', which can help to convert the join triggered by "Or" condition to SMJ or BHJ to accelerate this query.
62
61
63
-
## Optimize delta table with small files compaction
62
+
###Optimize delta table with small files compaction
64
63
65
64
This query is on a delta table with many small files. To improve the performance of queries, run the OPTIMIZE command on the delta table. More details could be found within this [article](https://aka.ms/small-file-advise-delta).
66
65
67
-
## Optimize Delta table with ZOrder
66
+
###Optimize Delta table with ZOrder
68
67
69
68
This query is on a Delta table and contains a highly selective filter. To improve the performance of queries, run the OPTIMIZE ZORDER BY command on the delta table. More details could be found within this [article](https://aka.ms/small-file-advise-delta).
69
+
70
+
## User Experience
71
+
72
+
The Apache Spark advisor displays the advices, including info, warning and errors, at Notebook cell output real-time.
73
+
74
+
* Info
75
+
76
+

77
+
78
+
* Warning
79
+
80
+

81
+
82
+
* Errors
83
+
84
+

85
+
70
86
## Next steps
71
87
72
88
For more information on monitoring pipeline runs, see the [Monitor pipeline runs using Synapse Studio](how-to-monitor-pipeline-runs.md) article.
0 commit comments