Skip to content

Commit 8fa5edb

Browse files
Merge pull request #225923 from jovanpop-msft/patch-240
Explained Julian calendar issue with the Spark 3.0
2 parents 7bf7d6c + 9998ea4 commit 8fa5edb

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

articles/synapse-analytics/sql/resources-self-help-sql-on-demand.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -678,7 +678,7 @@ If your query returns NULL values instead of partitioning columns or can't find
678678

679679
The error `Inserting value to batch for column type DATETIME2 failed` indicates that the serverless pool can't read the date values from the underlying files. The datetime value stored in the Parquet or Delta Lake file can't be represented as a `DATETIME2` column.
680680

681-
Inspect the minimum value in the file by using Spark, and check that some dates are less than 0001-01-03. If you stored the files by using Spark 2.4, the datetime values before are written by using the Julian calendar that isn't aligned with the proleptic Gregorian calendar used in serverless SQL pools.
681+
Inspect the minimum value in the file by using Spark, and check that some dates are less than 0001-01-03. If you stored the files by using the Spark 2.4 version or with the higher Spark version that still uses legacy datetime storage format, the datetime values before are written by using the Julian calendar that isn't aligned with the proleptic Gregorian calendar used in serverless SQL pools.
682682

683683
There might be a two-day difference between the Julian calendar used to write the values in Parquet (in some Spark versions) and the proleptic Gregorian calendar used in serverless SQL pool. This difference might cause conversion to a negative date value, which is invalid.
684684

@@ -695,7 +695,7 @@ deltaTable.update(col("MyDateTimeColumn") < '0001-02-02', { "MyDateTimeColumn":
695695

696696
This change removes the values that can't be represented. The other date values might be properly loaded but incorrectly represented because there's still a difference between Julian and proleptic Gregorian calendars. You might see unexpected date shifts even for the dates before `1900-01-01` if you use Spark 3.0 or older versions.
697697

698-
Consider [migrating to Spark 3.1 or higher](https://spark.apache.org/docs/latest/sql-migration-guide.html). It uses a proleptic Gregorian calendar that's aligned with the calendar in serverless SQL pool. Reload your legacy data with the higher version of Spark, and use the following setting to correct the dates:
698+
Consider [migrating to Spark 3.1 or higher](https://spark.apache.org/docs/latest/sql-migration-guide.html) and switching to the proleptic Gregorian calendar. The latest Spark versions use by default a proleptic Gregorian calendar that's aligned with the calendar in serverless SQL pool. Reload your legacy data with the higher version of Spark, and use the following setting to correct the dates:
699699

700700
```spark
701701
spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "CORRECTED")
@@ -1114,4 +1114,4 @@ You don't need to use separate databases to isolate data for different tenants.
11141114
- [Azure Synapse Analytics frequently asked questions](../overview-faq.yml)
11151115
- [Store query results to storage using serverless SQL pool in Azure Synapse Analytics](create-external-table-as-select.md)
11161116
- [Synapse Studio troubleshooting](../troubleshoot/troubleshoot-synapse-studio.md)
1117-
- [Troubleshoot a slow query on a dedicated SQL Pool](/troubleshoot/azure/synapse-analytics/dedicated-sql/troubleshoot-dsql-perf-slow-query)
1117+
- [Troubleshoot a slow query on a dedicated SQL Pool](/troubleshoot/azure/synapse-analytics/dedicated-sql/troubleshoot-dsql-perf-slow-query)

0 commit comments

Comments
 (0)