You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-26384][SQL] Propagate SQL configs for CSV schema inferring
## What changes were proposed in this pull request?
Currently, SQL configs are not propagated to executors while schema inferring in CSV datasource. For example, changing of `spark.sql.legacy.timeParser.enabled` does not impact on inferring timestamp types. In the PR, I propose to fix the issue by wrapping schema inferring action using `SQLExecution.withSQLConfPropagated`.
## How was this patch tested?
Added logging to `TimestampFormatter`:
```patch
-object TimestampFormatter {
+object TimestampFormatter extends Logging {
def apply(format: String, timeZone: TimeZone, locale: Locale): TimestampFormatter = {
if (SQLConf.get.legacyTimeParserEnabled) {
+ logError("LegacyFallbackTimestampFormatter is being used")
new LegacyFallbackTimestampFormatter(format, timeZone, locale)
} else {
+ logError("Iso8601TimestampFormatter is being used")
new Iso8601TimestampFormatter(format, timeZone, locale)
}
}
```
and run the command in `spark-shell`:
```shell
$ ./bin/spark-shell --conf spark.sql.legacy.timeParser.enabled=true
```
```scala
scala> Seq("2010|10|10").toDF.repartition(1).write.mode("overwrite").text("/tmp/foo")
scala> spark.read.option("inferSchema", "true").option("header", "false").option("timestampFormat", "yyyy|MM|dd").csv("/tmp/foo").printSchema()
18/12/18 10:47:27 ERROR TimestampFormatter: LegacyFallbackTimestampFormatter is being used
root
|-- _c0: timestamp (nullable = true)
```
Closesapache#23345 from MaxGekk/csv-schema-infer-propagate-configs.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
0 commit comments