Skip to content

Commit 6b7f7a8

Browse files
authored
Merge pull request #223868 from v-lanjunli/branchforupdateagain
title update
2 parents 5a96031 + 712dcfd commit 6b7f7a8

File tree

1 file changed

+14
-12
lines changed

1 file changed

+14
-12
lines changed

articles/synapse-analytics/monitoring/apache-spark-advisor.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,59 +11,59 @@ ms.subservice: spark
1111
ms.date: 06/23/2022
1212
---
1313

14-
# Apache Spark Advisor in Azure Synapse Analytics
14+
# Apache Spark Advisor in Azure Synapse Analytics (Preview)
1515

1616
The Apache Spark advisor analyzes commands and code run by Spark and displays real-time advice for Notebook runs. The Spark advisor has built-in patterns to help users avoid common mistakes, offer recommendations for code optimization, perform error analysis, and locate the root cause of failures.
1717

18-
## Built-in advices
18+
## Built-in advice
1919

20-
### May return inconsistent results when using 'randomSplit'
20+
#### May return inconsistent results when using 'randomSplit'
2121
Inconsistent or inaccurate results may be returned when working with the results of the 'randomSplit' method. Use Apache Spark (RDD) caching before using the 'randomSplit' method.
2222

2323
Method randomSplit() is equivalent to performing sample() on your data frame multiple times, with each sample refetching, partitioning, and sorting your data frame within partitions. The data distribution across partitions and sorting order is important for both randomSplit() and sample(). If either changes upon data refetch, there may be duplicates, or missing values across splits and the same sample using the same seed may produce different results.
2424

2525
These inconsistencies may not happen on every run, but to eliminate them completely, cache your data frame, repartition on a column(s), or apply aggregate functions such as groupBy.
2626

27-
### Table/view name is already in use
27+
#### Table/view name is already in use
2828
A view already exists with the same name as the created table, or a table already exists with the same name as the created view.
2929
When this name is used in queries or applications, only the view will be returned no matter which one created first. To avoid conflicts, rename either the table or the view.
3030

31-
### Unable to recognize a hint
31+
#### Unable to recognize a hint
3232
The selected query contains a hint that isn't recognized. Verify that the hint is spelled correctly.
3333

3434
```scala
3535
spark.sql("SELECT /*+ unknownHint */ * FROM t1")
3636
```
3737

38-
### Unable to find a specified relation name(s)
38+
#### Unable to find a specified relation name(s)
3939
Unable to find the relation(s) specified in the hint. Verify that the relation(s) are spelled correctly and accessible within the scope of the hint.
4040

4141
```scala
4242
spark.sql("SELECT /*+ BROADCAST(unknownTable) */ * FROM t1 INNER JOIN t2 ON t1.str = t2.str")
4343
```
4444

45-
### A hint in the query prevents another hint from being applied
45+
#### A hint in the query prevents another hint from being applied
4646
The selected query contains a hint that prevents another hint from being applied.
4747

4848
```scala
4949
spark.sql("SELECT /*+ BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.str = t2.str")
5050
```
5151

52-
### Enable 'spark.advise.divisionExprConvertRule.enable' to reduce rounding error propagation
52+
#### Enable 'spark.advise.divisionExprConvertRule.enable' to reduce rounding error propagation
5353
This query contains the expression with Double type. We recommend that you enable the configuration 'spark.advise.divisionExprConvertRule.enable', which can help reduce the division expressions and to reduce the rounding error propagation.
5454

5555
```text
5656
"t.a/t.b/t.c" convert into "t.a/(t.b * t.c)"
5757
```
5858

59-
### Enable 'spark.advise.nonEqJoinConvertRule.enable' to improve query performance
59+
#### Enable 'spark.advise.nonEqJoinConvertRule.enable' to improve query performance
6060
This query contains time consuming join due to "Or" condition within query. We recommend that you enable the configuration 'spark.advise.nonEqJoinConvertRule.enable', which can help to convert the join triggered by "Or" condition to SMJ or BHJ to accelerate this query.
6161

62-
### Optimize delta table with small files compaction
62+
#### Optimize delta table with small files compaction
6363

6464
This query is on a delta table with many small files. To improve the performance of queries, run the OPTIMIZE command on the delta table. More details could be found within this [article](https://aka.ms/small-file-advise-delta).
6565

66-
### Optimize Delta table with ZOrder
66+
#### Optimize Delta table with ZOrder
6767

6868
This query is on a Delta table and contains a highly selective filter. To improve the performance of queries, run the OPTIMIZE ZORDER BY command on the delta table. More details could be found within this [article](https://aka.ms/small-file-advise-delta).
6969

@@ -85,4 +85,6 @@ The Apache Spark advisor displays the advices, including info, warning and error
8585

8686
## Next steps
8787

88-
For more information on monitoring pipeline runs, see the [Monitor pipeline runs using Synapse Studio](how-to-monitor-pipeline-runs.md) article.
88+
For more information on monitoring Apache Spark applications, see the [Monitor Apache Spark applications using Synapse Studio](apache-spark-applications.md) article.
89+
90+
For more information to create a notebook, see the [How to use Synapse notebooks](../spark/apache-spark-development-using-notebooks.md)

0 commit comments

Comments
 (0)