Skip to content

Commit e24b865

Browse files
authored
Merge pull request #178288 from jovanpop-msft/patch-221
Delta Lake GA fixes
2 parents bd4ba6f + 05b6c8b commit e24b865

File tree

1 file changed

+6
-16
lines changed

1 file changed

+6
-16
lines changed

articles/synapse-analytics/sql/resources-self-help-sql-on-demand.md

Lines changed: 6 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -441,7 +441,7 @@ from pyspark.sql.functions import *
441441
442442
deltaTable = DeltaTable.forPath(spark,
443443
"abfss://[email protected]/delta-lake-data-set")
444-
deltaTable.update(col("MyDateTimeColumn") < '0001-02-02', { "MyDateTimeColumn": "0001-01-03" } )
444+
deltaTable.update(col("MyDateTimeColumn") < '0001-02-02', { "MyDateTimeColumn": null } )
445445
```
446446

447447
## Configuration
@@ -613,15 +613,9 @@ The easiest way is to grant yourself `Storage Blob Data Contributor` role on the
613613

614614
### Cannot find value of partitioning column in file
615615

616-
Delta Lake data sets may have `NULL` values in the partitioning columns. These partitions are stored in `HIVE_DEFAULT_PARTITION` folder. This is currently not supported in serverless SQL pool. In this case you will get the error that looks like:
617-
618-
```
619-
Resolving Delta logs on path 'https://....core.windows.net/.../' failed with error:
620-
Cannot find value of partitioning column '<column name>' in file
621-
'https://......core.windows.net/...../<column name>=__HIVE_DEFAULT_PARTITION__/part-00042-2c0d5c0e-8e89-4ab8-b514-207dcfd6fe13.c000.snappy.parquet'.
622-
```
616+
**Status**: Resolved
623617

624-
**Workaround:** Try to update your Delta Lake data set using Apache Spark pools and use some value (empty string or `"null"`) instead of `null` in the partitioning column.
618+
**Release**: November 2021
625619

626620
### JSON text is not properly formatted
627621

@@ -634,7 +628,7 @@ Msg 16513, Level 16, State 0, Line 1
634628
Error reading external metadata.
635629
```
636630
First, make sure that your Delta Lake data set is not corrupted.
637-
- Verify that you can read the content of the Delta Lake folder using Apache Spark pool in Azure Synapse or Databricks cluster. This way you will ensure that the `_delta_log` file is not corrupted.
631+
- Verify that you can read the content of the Delta Lake folder using Apache Spark pool in Azure Synapse. This way you will ensure that the `_delta_log` file is not corrupted.
638632
- Verify that you can read the content of data files by specifying `FORMAT='PARQUET'` and using recursive wildcard `/**` at the end of the URI path. If you can read all Parquet files, the issue is in `_delta_log` transaction log folder.
639633

640634
**Workaround** - try to create a checkpoint on Delta Lake data set using Apache Spark pool and re-run the query. The checkpoint will aggregate transactional json log files and might solve the issue.
@@ -650,13 +644,9 @@ Azure team will investigate the content of the `delta_log` file and provide more
650644

651645
### Resolving delta log on path ... failed with error: Cannot parse JSON object from log file
652646

653-
This error might happen due to the following reasons/unsupported features:
654-
- [BLOOM filter](/azure/databricks/delta/optimizations/bloom-filters) on Delta Lake dataset. Serverless SQL pools in Azure Synapse Analytics do not support datasets with the [BLOOM filter](/azure/databricks/delta/optimizations/bloom-filters).
655-
- Float column in Delta Lake data set with statistics.
656-
- Data set partitioned on a float column.
647+
**Status**: Resolved
657648

658-
**Workaround**: [Remove BLOOM filter](/azure/databricks/delta/optimizations/bloom-filters#drop-a-bloom-filter-index) if you want to read Delta Lake folder using the serverless SQL pool.
659-
If you have `float` columns that are causing the issue, you would need to re-partition the data set or remove the statistics.
649+
**Release**: November 2021
660650

661651
## Performance
662652

0 commit comments

Comments
 (0)