Skip to content

Commit 72e2cf9

Browse files
Merge pull request #230208 from ilijazagorac/patch-3
Adding explanation on missing columns
2 parents 58246aa + 2fe7963 commit 72e2cf9

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

articles/synapse-analytics/sql/resources-self-help-sql-on-demand.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -722,6 +722,10 @@ There are several mitigation steps that you can do to avoid this:
722722
- If you are using delta file format, use the optimize write feature in Spark. This can improve the performance of queries by reducing the amount of data that needs to be read and processed. How to use optimize write is described in [Using optimize write on Apache Spark](../spark/optimize-write-for-apache-spark.md).
723723
- To avoid some of the top-level wildcards by effectively hardcoding the implicit filters over partitioning columns use [dynamic SQL](../sql/develop-dynamic-sql.md).
724724

725+
### Missing column when using automatic schema inference
726+
727+
You can easily query files without knowing or specifying schema, by omitting WITH clause. In that case column names and data types will be inferred from the files. Have in mind that if you are reading number of files at once, the schema will be inferred from the first file service gets from the storage. This can mean that some of the columns expected are omitted, all because the file used by the service to define the schema did not contain these columns. To explicitly specify the schema, please use OPENROWSET WITH clause. If you specify schema (by using external table or OPENROWSET WITH clause) default lax path mode will be used. That means that the columns that don’t exist in some files will be returned as NULLs (for rows from those files). To understand how path mode is used, please check the following [documentation](../sql/develop-openrowset.md) and [sample](../sql/develop-openrowset.md#specify-columns-using-json-paths).
728+
725729
## Configuration
726730

727731
Serverless SQL pools enable you to use T-SQL to configure database objects. There are some constraints:
@@ -909,7 +913,7 @@ Our engineering team is currently working on a full support for Spark 3.3.
909913
If you created a Delta table in Spark, and it is not shown in the serverless SQL pool, check the following:
910914
- Wait some time (usually 30 seconds) because the Spark tables are synchronized with delay.
911915
- If the table didn't appear in the serverless SQL pool after some time, check the schema of the Spark Delta table. Spark tables with complex types or the types that are not supported in serverless are not available. Try to create a Spark Parquet table with the same schema in a lake database and check would that table appears in the serverless SQL pool.
912-
- Check could workspace Managed Identity access Delta Lake folder that is referenced by the table. Serverless SQL pool uses workspace Managed Identity to get the table column information from the storage to create the table.
916+
- Check the workspace Managed Identity access Delta Lake folder that is referenced by the table. Serverless SQL pool uses workspace Managed Identity to get the table column information from the storage to create the table.
913917

914918
## Lake database
915919

0 commit comments

Comments
 (0)