Skip to content

Commit 4ace820

Browse files
authored
Merge pull request #180509 from jovanpop-msft/patch-221
Workaround for moving Julian dates
2 parents 2c687f3 + 2e9cfac commit 4ace820

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

articles/synapse-analytics/sql/resources-self-help-sql-on-demand.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -431,9 +431,9 @@ This error indicates that there are some external tables with the columns contai
431431

432432
### Inserting value to batch for column type DATETIME2 failed
433433

434-
The datetime value stored in Parquet/Delta Lake file cannot be represented as `DATETIME2` column. Inspect the minimum value in the file using spark and check are there some dates less than 0001-01-03. There might be a 2-days difference between Julian calendar user to write the values in Parquet (in some Spark versions) and Gregorian-proleptic calendar used in serverless SQL pool, which might cause conversion to invalid (negative) date value.
434+
The datetime value stored in Parquet/Delta Lake file cannot be represented as `DATETIME2` column. Inspect the minimum value in the file using spark and check are there some dates less than 0001-01-03. If you stored the files using the Spark 2.4, the date time values before are written using the Julain calendar that is not aligned with the Gregorian Proleptic calendar used in serverless SQL pools. There might be a 2-days difference between Julian calendar user to write the values in Parquet (in some Spark versions) and Gregorian Proleptic calendar used in serverless SQL pool, which might cause conversion to invalid (negative) date value.
435435

436-
Try to use Spark to update these values. The following sample shows how to update the values in Delta Lake:
436+
Try to use Spark to update these values because they are treated as invalid date values in SQL. The following sample shows how to update the values that are out of SQL date ranges to `NULL` in Delta Lake:
437437

438438
```spark
439439
from delta.tables import *
@@ -444,6 +444,14 @@ deltaTable = DeltaTable.forPath(spark,
444444
deltaTable.update(col("MyDateTimeColumn") < '0001-02-02', { "MyDateTimeColumn": null } )
445445
```
446446

447+
Note this change will remove the values that cannot be represented. The other date values might be properly loaded but incorrectly represented because there is still a difference between Julian and Gregorian Proleptic calendars. You might see an unexpected date shifts even for the dates before `1900-01-01` if you are using Spark 3.0 or older versions.
448+
Consider [migrating to Spark 3.1 or higher](https://spark.apache.org/docs/latest/sql-migration-guide.html) where it is used Gregorian Proleptic calendar that is aligned with the calendar in the serverless SQL pool.
449+
You should reload your legacy data with the higher version of Spark, and use the following setting to correct the dates:
450+
451+
```spark
452+
spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "CORRECTED")
453+
```
454+
447455
## Configuration
448456

449457
### Query fails with: Please create a master key in the database or open the master key in the session before performing this operation.

0 commit comments

Comments
 (0)