You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/monitoring/apache-spark-advisor.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Spark Advisor
3
-
description: Spark Advisor would be a system to automatically analyze commands/queries, and show the appropriate advise when customer execute code or query.
3
+
description: Spark Advisor is a system to automatically analyze commands/queries, and show the appropriate advice when a customer executes code or query.
4
4
services: synapse-analytics
5
5
author: jejiang
6
6
ms.author: jejiang
@@ -13,19 +13,19 @@ ms.date: 06/23/2022
13
13
14
14
# Spark Advisor
15
15
16
-
Spark Advisor would be a system to automatically analyze commands/queries, and show the appropriate advise when customer executes code or query. After applying the advise, you would have chance to improve your execution performance, decrease cost and fix the execution failures.
16
+
Spark Advisor is a system to automatically analyze commands/queries, and show the appropriate advise when customer executes code or query. After applying the advice, you would have chance to improve your execution performance, decrease cost and fix the execution failures.
17
17
18
18
19
-
## Advises provided
19
+
## Advice provided
20
20
21
-
## May return inconsistent results when using 'randomSplit'
21
+
###May return inconsistent results when using 'randomSplit'
22
22
Inconsistent or inaccurate results may be returned when working with the results of the 'randomSplit' method. Use Apache Spark (RDD) caching before using the 'randomSplit' method.
23
23
24
-
Method randomSplit() is equivalent to performing sample() on your data frame multiple times, with each sample refetching, partitioning, and sorting your data frame within partitions. The data distribution across partitions and sorting order is important for both randomSplit() and sample(). If either change upon data refetch, there may be duplicates, or missing values across splits and the same sample using the same seed may produce different results.
24
+
Method randomSplit() is equivalent to performing sample() on your data frame multiple times, with each sample refetching, partitioning, and sorting your data frame within partitions. The data distribution across partitions and sorting order is important for both randomSplit() and sample(). If either changes upon data refetch, there may be duplicates, or missing values across splits and the same sample using the same seed may produce different results.
25
25
26
26
These inconsistencies may not happen on every run, but to eliminate them completely, cache your data frame, repartition on a column(s), or apply aggregate functions such as groupBy.
27
27
28
-
## Table/View Name is already in use
28
+
###Table/view name is already in use
29
29
A view already exists with the same name as the created table, or a table already exists with the same name as the created view.
30
30
When this name is used in queries or applications, only the view will be returned no matter, which one created first. To avoid conflicts, rename either the table or the view.
31
31
@@ -37,7 +37,7 @@ The selected query contains a hint that isn't recognized. Verify that the hint i
37
37
spark.sql("SELECT /*+ unknownHint */ * FROM t1")
38
38
```
39
39
40
-
### Unable to find a specified Relation name(s)
40
+
### Unable to find a specified relation name(s)
41
41
Unable to find the relation(s) specified in the hint. Verify that the relation(s) are spelled correctly and accessible within the scope of the hint.
0 commit comments