[SPARK-28054][SQL] Fix error when insert Hive partitioned table dynamically where partition name is upper case

viirya · HyukjinKwon · commit a00774afeaaf · 2019-06-24T09:44:38.000+09:00
## What changes were proposed in this pull request? When we use upper case partition name in Hive table, like: ``` CREATE TABLE src (KEY STRING, VALUE STRING) PARTITIONED BY (DS STRING) ``` Then, `insert into table` query doesn't work ``` INSERT INTO TABLE src PARTITION(ds) SELECT 'k' key, 'v' value, '1' ds // or INSERT INTO TABLE src PARTITION(DS) SELECT 'k' KEY, 'v' VALUE, '1' DS ``` ``` [info] org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: Partition spec {ds=, DS=1} contains non-partition columns; ``` As Hive metastore is not case preserving and keeps partition columns with lower cased names, we lowercase column names in partition spec before passing to Hive client. But we write upper case column names in partition paths. However, when calling `loadDynamicPartitions` to do `insert into table` for dynamic partition, Hive calculates full path spec for partition paths. So it calculates a partition spec like `{ds=, DS=1}` in above case and fails partition column validation. This patch is proposed to fix the issue by lowercasing the column names in written partition paths for Hive partitioned table. This fix touchs `saveAsHiveFile` method, which is used in `InsertIntoHiveDirCommand` and `InsertIntoHiveTable` commands. Among them, only `InsertIntoHiveTable` passes `partitionAttributes` parameter. So I think this change only affects `InsertIntoHiveTable` command. ## How was this patch tested? Added test. Closes apache#24886 from viirya/SPARK-28054. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
@@ -83,6 +83,16 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       jobId = java.util.UUID.randomUUID().toString,
       outputPath = outputLocation)
 
+    // SPARK-28054: Hive metastore is not case preserving and keeps partition columns
+    // with lower cased names, Hive will validate the column names in partition spec and
+    // the partition paths. Besides lowercasing the column names in the partition spec,
+    // we also need to lowercase the column names in written partition paths.
+    // scalastyle:off caselocale
+    val hiveCompatiblePartitionColumns = partitionAttributes.map { attr =>
+     attr.withName(attr.name.toLowerCase)
+    }
+    // scalastyle:on caselocale
+
     FileFormatWriter.write(
       sparkSession = sparkSession,
       plan = plan,
@@ -91,7 +101,7 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       outputSpec =
         FileFormatWriter.OutputSpec(outputLocation, customPartitionLocations, outputColumns),
       hadoopConf = hadoopConf,
-      partitionColumns = partitionAttributes,
+      partitionColumns = hiveCompatiblePartitionColumns,
       bucketSpec = None,
       statsTrackers = Seq(basicWriteJobStatsTracker(hadoopConf)),
       options = Map.empty)
diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
@@ -1188,6 +1188,24 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
       }
     }
   }
+
+  test("SPARK-28054: Unable to insert partitioned table when partition name is upper case") {
+    withTable("spark_28054_test") {
+      sql("set hive.exec.dynamic.partition.mode=nonstrict")
+      sql("CREATE TABLE spark_28054_test (KEY STRING, VALUE STRING) PARTITIONED BY (DS STRING)")
+
+      sql("INSERT INTO TABLE spark_28054_test PARTITION(DS) SELECT 'k' KEY, 'v' VALUE, '1' DS")
+
+      assertResult(Array(Row("k", "v", "1"))) {
+        sql("SELECT * from spark_28054_test").collect()
+      }
+
+      sql("INSERT INTO TABLE spark_28054_test PARTITION(ds) SELECT 'k' key, 'v' value, '2' ds")
+      assertResult(Array(Row("k", "v", "1"), Row("k", "v", "2"))) {
+        sql("SELECT * from spark_28054_test").collect()
+      }
+    }
+  }
 }
 
 // for SPARK-2180 test

Original file line number	Diff line number	Diff line change
`@@ -1188,6 +1188,24 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd`
`1188`	`1188`	`}`
`1189`	`1189`	`}`
`1190`	`1190`	`}`
	`1191`	`+`
	`1192`	`+ test("SPARK-28054: Unable to insert partitioned table when partition name is upper case") {`
	`1193`	`+ withTable("spark_28054_test") {`
	`1194`	`+ sql("set hive.exec.dynamic.partition.mode=nonstrict")`
	`1195`	`+ sql("CREATE TABLE spark_28054_test (KEY STRING, VALUE STRING) PARTITIONED BY (DS STRING)")`
	`1196`	`+`
	`1197`	`+ sql("INSERT INTO TABLE spark_28054_test PARTITION(DS) SELECT 'k' KEY, 'v' VALUE, '1' DS")`
	`1198`	`+`
	`1199`	`+ assertResult(Array(Row("k", "v", "1"))) {`
	`1200`	`+ sql("SELECT * from spark_28054_test").collect()`
	`1201`	`+ }`
	`1202`	`+`
	`1203`	`+ sql("INSERT INTO TABLE spark_28054_test PARTITION(ds) SELECT 'k' key, 'v' value, '2' ds")`
	`1204`	`+ assertResult(Array(Row("k", "v", "1"), Row("k", "v", "2"))) {`
	`1205`	`+ sql("SELECT * from spark_28054_test").collect()`
	`1206`	`+ }`
	`1207`	`+ }`
	`1208`	`+ }`
`1191`	`1209`	`}`
`1192`	`1210`
`1193`	`1211`	`// for SPARK-2180 test`