Skip to content

Commit d4a277f

Browse files
sujith71955gatorsmile
authored andcommitted
[SPARK-24812][SQL] Last Access Time in the table description is not valid
## What changes were proposed in this pull request? Last Access Time will always displayed wrong date Thu Jan 01 05:30:00 IST 1970 when user run DESC FORMATTED table command In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong date. seems to be a limitation as of now even from hive, better we can follow the hive behavior unless the limitation has been resolved from hive. spark client output ![spark_desc table](https://user-images.githubusercontent.com/12999161/42753448-ddeea66a-88a5-11e8-94aa-ef8d017f94c5.png) Hive client output ![hive_behaviour](https://user-images.githubusercontent.com/12999161/42753489-f4fd366e-88a5-11e8-83b0-0f3a53ce83dd.png) ## How was this patch tested? UT has been added which makes sure that the wrong date "Thu Jan 01 05:30:00 IST 1970 " shall not be added as value for the Last Access property Author: s71955 <[email protected]> Closes apache#21775 from sujith71955/master_hive.
1 parent 9d27541 commit d4a277f

File tree

3 files changed

+18
-1
lines changed

3 files changed

+18
-1
lines changed

docs/sql-programming-guide.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1850,6 +1850,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
18501850

18511851
## Upgrading From Spark SQL 2.3 to 2.4
18521852

1853+
- Since Spark 2.4, Spark will display table description column Last Access value as UNKNOWN when the value was Jan 01 1970.
18531854
- Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader for ORC files by default. To do that, `spark.sql.orc.impl` and `spark.sql.orc.filterPushdown` change their default values to `native` and `true` respectively.
18541855
- In PySpark, when Arrow optimization is enabled, previously `toPandas` just failed when Arrow optimization is unable to be used whereas `createDataFrame` from Pandas DataFrame allowed the fallback to non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas DataFrame allow the fallback by default, which can be switched off by `spark.sql.execution.arrow.fallback.enabled`.
18551856
- Since Spark 2.4, writing an empty dataframe to a directory launches at least one write task, even if physically the dataframe has no partition. This introduces a small behavior change that for self-describing file formats like Parquet and Orc, Spark creates a metadata-only file in the target directory when writing a 0-partition dataframe, so that schema inference can still work if users read that directory later. The new behavior is more reasonable and more consistent regarding writing empty dataframe.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,10 @@ case class CatalogTablePartition(
114114
map.put("Partition Parameters", s"{${parameters.map(p => p._1 + "=" + p._2).mkString(", ")}}")
115115
}
116116
map.put("Created Time", new Date(createTime).toString)
117-
map.put("Last Access", new Date(lastAccessTime).toString)
117+
val lastAccess = {
118+
if (-1 == lastAccessTime) "UNKNOWN" else new Date(lastAccessTime).toString
119+
}
120+
map.put("Last Access", lastAccess)
118121
stats.foreach(s => map.put("Partition Statistics", s.simpleString))
119122
map
120123
}

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ package org.apache.spark.sql.hive.execution
1919

2020
import java.io.File
2121
import java.net.URI
22+
import java.util.Date
2223

2324
import scala.language.existentials
2425

@@ -2250,6 +2251,18 @@ class HiveDDLSuite
22502251
}
22512252
}
22522253

2254+
test("SPARK-24812: desc formatted table for last access verification") {
2255+
withTable("t1") {
2256+
sql(
2257+
"CREATE TABLE IF NOT EXISTS t1 (c1_int INT, c2_string STRING, c3_float FLOAT)")
2258+
val desc = sql("DESC FORMATTED t1").filter($"col_name".startsWith("Last Access"))
2259+
.select("data_type")
2260+
// check if the last access time doesnt have the default date of year
2261+
// 1970 as its a wrong access time
2262+
assert(!(desc.first.toString.contains("1970")))
2263+
}
2264+
}
2265+
22532266
test("SPARK-24681 checks if nested column names do not include ',', ':', and ';'") {
22542267
val expectedMsg = "Cannot create a table having a nested column whose name contains invalid " +
22552268
"characters (',', ':', ';') in Hive metastore."

0 commit comments

Comments
 (0)