You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/query-delta-lake-format.md
+7-9Lines changed: 7 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ services: synapse analytics
5
5
ms.service: azure-synapse-analytics
6
6
ms.topic: how-to
7
7
ms.subservice: sql
8
-
ms.date: 02/15/2023
8
+
ms.date: 12/17/2024
9
9
author: jovanpop-msft
10
10
ms.author: jovanpop
11
11
ms.reviewer: whhender, wiassaf
@@ -23,7 +23,7 @@ You can learn more from the [how to query delta lake tables video](https://www.y
23
23
The serverless SQL pool in Synapse workspace enables you to read the data stored in Delta Lake format, and serve it to reporting tools.
24
24
A serverless SQL pool can read Delta Lake files that are created using Apache Spark, Azure Databricks, or any other producer of the Delta Lake format.
25
25
26
-
Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files using Scala, PySpark, and .NET. Serverless SQL pools help data analysts to create reports on Delta Lake files created by data engineers.
26
+
Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files using Scala, PySpark, and .NET. Serverless SQL pools help data analysts to create reports on Delta Lake files created by data engineers.
27
27
28
28
> [!IMPORTANT]
29
29
> Querying Delta Lake format using the serverless SQL pool is **Generally available** functionality. However, querying Spark Delta tables is still in public preview and not production ready. There are known issues that might happen if you query Delta tables created using the Spark pools. See the known issues in [Serverless SQL pool self-help](resources-self-help-sql-on-demand.md#delta-lake).
@@ -50,7 +50,7 @@ The URI in the `OPENROWSET` function must reference the root Delta Lake folder t
50
50
> [!div class="mx-imgBorder"]
51
51
>
52
52
53
-
If you don't have this subfolder, you are not using Delta Lake format. You can convert your plain Parquet files in the folder to Delta Lake format using the following Apache Spark Python script:
53
+
If you don't have this subfolder, you aren't using Delta Lake format. You can convert your plain Parquet files in the folder to Delta Lake format using the following Apache Spark Python script:
54
54
55
55
```python
56
56
%%pyspark
@@ -64,7 +64,7 @@ To improve the performance of your queries, consider specifying explicit types i
64
64
> The serverless Synapse SQL pool uses schema inference to automatically determine columns and their types. The rules for schema inference are the same used for Parquet files.
65
65
> For Delta Lake type mapping to SQL native type check [type mapping for Parquet](develop-openrowset.md#type-mapping-for-parquet).
66
66
67
-
Make sure you can access your file. If your file is protected with SAS key or custom Azure identity, you will need to set up a [server level credential for sql login](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#server-level-credential).
67
+
Make sure you can access your file. If your file is protected with SAS key or custom Azure identity, you'll need to set up a [server level credential for sql login](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#server-level-credential).
68
68
69
69
> [!IMPORTANT]
70
70
> Ensure you are using a UTF-8 database collation (for example `Latin1_General_100_BIN2_UTF8`) because string values in Delta Lake files are encoded using UTF-8 encoding.
@@ -80,7 +80,7 @@ The previous examples used the full path to the file. As an alternative, you can
80
80
> [!IMPORTANT]
81
81
> Data sources can be created only in custom databases (not in the master database or the databases replicated from Apache Spark pools).
82
82
83
-
To use the samples below, you will need to complete the following step:
83
+
To use the samples below, you'll need to complete the following step:
84
84
1.**Create a database** with a datasource that references [NYC Yellow Taxi](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/) storage account.
85
85
1. Initialize the objects by executing [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql) on the database you created in step 1. This setup script will create the data sources, database scoped credentials, and external file formats that are used in these samples.
86
86
@@ -172,7 +172,7 @@ The folder name in the `OPENROWSET` function (`yellow` in this example) is conca
172
172
> [!div class="mx-imgBorder"]
173
173
>
174
174
175
-
If you don't have this subfolder, you are not using Delta Lake format. You can convert your plain Parquet files in the folder to Delta Lake format using the following Apache Spark Python script:
175
+
If you don't have this subfolder, you aren't using Delta Lake format. You can convert your plain Parquet files in the folder to Delta Lake format using the following Apache Spark Python script:
176
176
177
177
```python
178
178
%%pyspark
@@ -186,13 +186,11 @@ The second argument of `DeltaTable.convertToDeltaLake` function represents the p
186
186
187
187
- Review the limitations and the known issues on [Synapse serverless SQL pool self-help page](resources-self-help-sql-on-demand.md#delta-lake).
188
188
189
-
## Next steps
189
+
## Related content
190
190
191
191
Advance to the next article to learn how to [Query Parquet nested types](query-parquet-nested-types.md).
192
192
If you want to continue building Delta Lake solution, learn how to create [views](create-use-views.md#delta-lake-views) or [external tables](create-use-external-tables.md#delta-lake-external-table) on the Delta Lake folder.
193
193
194
-
## See also
195
-
196
194
-[What is Delta Lake](../spark/apache-spark-what-is-delta-lake.md)
197
195
-[Learn how to use Delta Lake in Apache Spark pools for Azure Synapse Analytics](../spark/apache-spark-delta-lake-overview.md)
198
196
-[Azure Databricks Delta Lake best practices](/azure/databricks/delta/best-practices)
0 commit comments