Skip to content

Commit 13a67b0

Browse files
eatoncysgatorsmile
authored andcommitted
[SPARK-24870][SQL] Cache can't work normally if there are case letters in SQL
## What changes were proposed in this pull request? Modified the canonicalized to not case-insensitive. Before the PR, cache can't work normally if there are case letters in SQL, for example: sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive") sql("select key, sum(case when Key > 0 then 1 else 0 end) as positiveNum " + "from src group by key").cache().createOrReplaceTempView("src_cache") sql( s"""select a.key from (select key from src_cache where positiveNum = 1)a left join (select key from src_cache )b on a.key=b.key """).explain The physical plan of the sql is: ![image](https://user-images.githubusercontent.com/26834091/42979518-3decf0fa-8c05-11e8-9837-d5e4c334cb1f.png) The subquery "select key from src_cache where positiveNum = 1" on the left of join can use the cache data, but the subquery "select key from src_cache" on the right of join cannot use the cache data. ## How was this patch tested? new added test Author: 10129659 <[email protected]> Closes apache#21823 from eatoncys/canonicalized.
1 parent d2436a8 commit 13a67b0

File tree

2 files changed

+16
-1
lines changed

2 files changed

+16
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ object QueryPlan extends PredicateHelper {
284284
if (ordinal == -1) {
285285
ar
286286
} else {
287-
ar.withExprId(ExprId(ordinal))
287+
ar.withExprId(ExprId(ordinal)).canonicalized
288288
}
289289
}.canonicalized.asInstanceOf[T]
290290
}

sql/core/src/test/scala/org/apache/spark/sql/execution/SameResultSuite.scala

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,11 @@
1818
package org.apache.spark.sql.execution
1919

2020
import org.apache.spark.sql.{DataFrame, QueryTest}
21+
import org.apache.spark.sql.catalyst.expressions.AttributeReference
22+
import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, Project}
2123
import org.apache.spark.sql.functions._
2224
import org.apache.spark.sql.test.SharedSQLContext
25+
import org.apache.spark.sql.types.IntegerType
2326

2427
/**
2528
* Tests for the sameResult function for [[SparkPlan]]s.
@@ -58,4 +61,16 @@ class SameResultSuite extends QueryTest with SharedSQLContext {
5861
val df4 = spark.range(10).agg(sumDistinct($"id"))
5962
assert(df3.queryExecution.executedPlan.sameResult(df4.queryExecution.executedPlan))
6063
}
64+
65+
test("Canonicalized result is case-insensitive") {
66+
val a = AttributeReference("A", IntegerType)()
67+
val b = AttributeReference("B", IntegerType)()
68+
val planUppercase = Project(Seq(a), LocalRelation(a, b))
69+
70+
val c = AttributeReference("a", IntegerType)()
71+
val d = AttributeReference("b", IntegerType)()
72+
val planLowercase = Project(Seq(c), LocalRelation(c, d))
73+
74+
assert(planUppercase.sameResult(planLowercase))
75+
}
6176
}

0 commit comments

Comments
 (0)