You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tools/sqlpackage/sqlpackage-with-data-in-parquet-files.md
+71-27Lines changed: 71 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,8 +3,8 @@ title: "SqlPackage with Data in Parquet Files (Preview)"
3
3
description: Tips for using SqlPackage with data stored in Azure Blob Storage
4
4
author: dzsquared
5
5
ms.author: drskwier
6
-
ms.reviewer: llali
7
-
ms.date: 10/19/2023
6
+
ms.reviewer: llali, randolphwest
7
+
ms.date: 04/17/2025
8
8
ms.service: sql
9
9
ms.topic: conceptual
10
10
ms.collection:
@@ -17,24 +17,27 @@ ms.custom:
17
17
18
18
This article covers SqlPackage support for interacting with data stored in Azure Blob Storage that is in Parquet format. For SQL Server 2022 and Azure SQL Managed Instance, preview support for [extract](#extract-export-data) and [publish](#publish-import-data) with data in Parquet files in Azure Blob Storage is available in SqlPackage 162.1.176 and higher. Azure SQL Database and SQL Server 2019 and earlier aren't supported. The [import](sqlpackage-import.md) and [export](sqlpackage-export.md) actions continue to be available for SQL Server, Azure SQL Managed Instance, and Azure SQL Database. Support for Parquet files in Azure Blob Storage continues to be generally available for [Azure Synapse Analytics](sqlpackage-for-azure-synapse-analytics.md).
19
19
20
-
With [extract](#extract-export-data), the database schema (`.dacpac` file) is written to the local client running SqlPackage and the data is written to Azure Blob Storage in Parquet format. The data is stored in individual folders named with two-part table names. [CETAS](../../t-sql/statements/create-external-table-as-select-transact-sql.md) is used to write the files in Azure Blob Storage.
20
+
With [extract](#extract-export-data), the database schema (`.dacpac` file) is written to the local client running SqlPackage and the data is written to Azure Blob Storage in Parquet format. The data is stored in individual folders named with two-part table names. [CREATE EXTERNAL TABLE AS SELECT (CETAS)](../../t-sql/statements/create-external-table-as-select-transact-sql.md) is used to write the files in Azure Blob Storage.
21
21
22
22
With [publish](#publish-import-data), the database schema (`.dacpac` file) is read from the local client running SqlPackage and the data is read from or written to Azure Blob Storage in Parquet format.
23
23
24
24
In SQL databases hosted in Azure, the extract/publish operations with Parquet files offer improved performance over import/export operations with `.bacpac` files in many scenarios.
25
25
26
-
27
26
## Extract (export data)
27
+
28
28
To export data from a database to Azure Blob Storage, the SqlPackage [extract](sqlpackage-extract.md) action is used with following properties:
29
-
- /p:AzureStorageBlobEndpoint
30
-
- /p:AzureStorageContainer
31
-
- /p:AzureStorageKey or /p:AzureSharedAccessSignatureToken
29
+
30
+
-`/p:AzureStorageBlobEndpoint`
31
+
-`/p:AzureStorageContainer`
32
+
-`/p:AzureSharedAccessSignatureToken` or `/p:AzureStorageKey` (not supported for use with SQL Server)
33
+
34
+
:::image type="content" source="media/sqlpackage-with-data-in-parquet-files/data-extract.png" alt-text="Screenshot of Summary of data extract from a database with a .dacpac file written to a SqlPackage environment and the table data written to Azure Blob Storage in parquet files." lightbox="media/sqlpackage-with-data-in-parquet-files/data-extract.png":::
32
35
33
36
Access for the database to access the blob storage container is authorized via a storage account key. The database schema (.dacpac file) is written to the local client running SqlPackage and the data is written to Azure Blob Storage in Parquet format.
34
37
35
38
The parameter `/p:AzureStorageRootPath` is optional, which sets the storage root path within the container. Without this property, the path defaults to `servername/databasename/timestamp/`. Data is stored in individual folders named with two-part table names. The number of files created per table depends upon the MAXDOP and available SQL cores at the time of the export.
36
39
37
-
Finally, the property `/p:TableData` specifies which tables have their data exported. Specify the table name with or without the brackets surrounding the name parts in the format schema_name.table_identifier. This property may be specified multiple times to indicate multiple tables.
40
+
Finally, the property `/p:TableData` specifies which tables have their data exported. Specify the table name with or without the brackets surrounding the name parts in the format schema_name.table_identifier. This property can be specified multiple times to indicate multiple tables.
38
41
39
42
### Example
40
43
@@ -49,10 +52,11 @@ See [SqlPackage extract](sqlpackage-extract.md#examples) for more examples of au
49
52
## Publish (import data)
50
53
51
54
To import data from Parquet files in Azure Blob Storage to a database, the SqlPackage [publish](sqlpackage-publish.md) action is used with the following properties:
52
-
- /p:AzureStorageBlobEndpoint
53
-
- /p:AzureStorageContainer
54
-
- /p:AzureStorageRootPath
55
-
- /p:AzureStorageKey or /p:AzureSharedAccessSignatureToken
55
+
56
+
-`/p:AzureStorageBlobEndpoint`
57
+
-`/p:AzureStorageContainer`
58
+
-`/p:AzureStorageRootPath`
59
+
-`/p:AzureSharedAccessSignatureToken` or `/p:AzureStorageKey` (not supported for use with SQL Server)
56
60
57
61
Access for publish can be authorized via a storage account key or a shared access signature (SAS) token. The database schema (.dacpac file) is read from the local client running SqlPackage and the data is read from Azure Blob Storage in Parquet format.
See [SqlPackage publish](sqlpackage-publish.md#examples) for more examples of authentication types available.
68
72
69
-
70
73
## Limitations
71
74
72
-
### Polybase
75
+
### PolyBase
73
76
74
-
[Polybase](../../relational-databases/polybase/polybase-guide.md) is required for SqlPackage operations with Parquet files. The following query can be used to check if Polybase is enabled:
77
+
[PolyBase](../../relational-databases/polybase/polybase-guide.md) is required for SqlPackage operations with Parquet files. The following query can be used to check if PolyBase is enabled:
75
78
76
79
```sql
77
80
// configuration_id =16397 is 'allow polybase export'
@@ -80,17 +83,51 @@ SELECT configuration_id, value_in_use FROM sys.configurations
80
83
WHERE configuration_id IN (16397, 16399)
81
84
```
82
85
83
-
You may need to enable [Polybase](../../relational-databases/polybase/polybase-installation.md) or [Polybase export](../../database-engine/configure-windows/allow-polybase-export.md). Enabling Polybase on Azure SQL Managed Instance requires [PowerShell or Azure CLI](/sql/t-sql/statements/create-external-table-as-select-transact-sql?view=azuresqldb-mi-current&preserve-view=true#methods-to-enable-cetas). It's recommended that you evaluate whether enabling Polybase is right for your environment before making configuration changes.
86
+
You might need to enable [PolyBase](../../relational-databases/polybase/polybase-installation.md) or [PolyBase export](../../database-engine/configure-windows/allow-polybase-export.md). Enabling PolyBase on Azure SQL Managed Instance requires [PowerShell or Azure CLI](/sql/t-sql/statements/create-external-table-as-select-transact-sql?view=azuresqldb-mi-current&preserve-view=true#methods-to-enable-cetas). You should evaluate whether enabling PolyBase is right for your environment before making configuration changes.
84
87
85
88
### Table and data types
86
89
87
-
Data types supported by [CETAS](../../t-sql/statements/create-external-table-as-select-transact-sql.md#supported-data-types) are supported for extract and publish operations with Parquet files.
90
+
Most data types are supported for extract and publish operations with Parquet files. Tables with unsupported data types result in the table data for that table being exported to the `.dacpac` file instead of in Parquet format. The following data types are supported and are written to Parquet files in Azure Blob Storage:
91
+
92
+
-**char**
93
+
-**varchar**
94
+
-**nchar**
95
+
-**nvarchar**
96
+
-**text**
97
+
-**ntext**
98
+
-**decimal**
99
+
-**numeric**
100
+
-**float**
101
+
-**real**
102
+
-**bit**
103
+
-**tinyint**
104
+
-**smallint**
105
+
-**int**
106
+
-**bigint**
107
+
-**smallmoney**
108
+
-**money**
109
+
-**smalldate**
110
+
-**smalldatetime**
111
+
-**date**
112
+
-**datetime**
113
+
-**datetime2**
114
+
-**datetimeoffset**
115
+
-**time**
116
+
-**uniqueidentifier**
117
+
-**timestamp**
118
+
-**rowversion**
119
+
-**binary**
120
+
-**varbinary**
121
+
-**image**
122
+
-**xml**
123
+
-**json**
124
+
-**vector**
88
125
89
126
Ledger tables are enabled for extract and publish operations with Parquet files.
90
127
91
128
Data stored with Always Encrypted isn't supported for extract and publish operations with Parquet files.
92
129
93
-
Checking the database for unsupported types is done prior to extract to Parquet by SqlPackage, but you can examine your database quickly with T-SQL. The following sample query returns a result set of types and tables with types not supported for writing to Parquet files.
130
+
You can examine your database with T-SQL to detect data types that would be written into the `.dacpac` file instead of in Parquet files written directly to Azure Blob Storage. The following sample query returns a result set of types and tables with types not supported for writing to Parquet files.
0 commit comments