Skip to content

Commit aebeda4

Browse files
dzsquaredrwestMSFT
andauthored
SqlPackage with Parquet files (#33819)
Co-authored-by: Randolph West MSFT <[email protected]>
1 parent c85ae30 commit aebeda4

File tree

2 files changed

+71
-27
lines changed

2 files changed

+71
-27
lines changed
31 KB
Loading

docs/tools/sqlpackage/sqlpackage-with-data-in-parquet-files.md

Lines changed: 71 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ title: "SqlPackage with Data in Parquet Files (Preview)"
33
description: Tips for using SqlPackage with data stored in Azure Blob Storage
44
author: dzsquared
55
ms.author: drskwier
6-
ms.reviewer: llali
7-
ms.date: 10/19/2023
6+
ms.reviewer: llali, randolphwest
7+
ms.date: 04/17/2025
88
ms.service: sql
99
ms.topic: conceptual
1010
ms.collection:
@@ -17,24 +17,27 @@ ms.custom:
1717

1818
This article covers SqlPackage support for interacting with data stored in Azure Blob Storage that is in Parquet format. For SQL Server 2022 and Azure SQL Managed Instance, preview support for [extract](#extract-export-data) and [publish](#publish-import-data) with data in Parquet files in Azure Blob Storage is available in SqlPackage 162.1.176 and higher. Azure SQL Database and SQL Server 2019 and earlier aren't supported. The [import](sqlpackage-import.md) and [export](sqlpackage-export.md) actions continue to be available for SQL Server, Azure SQL Managed Instance, and Azure SQL Database. Support for Parquet files in Azure Blob Storage continues to be generally available for [Azure Synapse Analytics](sqlpackage-for-azure-synapse-analytics.md).
1919

20-
With [extract](#extract-export-data), the database schema (`.dacpac` file) is written to the local client running SqlPackage and the data is written to Azure Blob Storage in Parquet format. The data is stored in individual folders named with two-part table names. [CETAS](../../t-sql/statements/create-external-table-as-select-transact-sql.md) is used to write the files in Azure Blob Storage.
20+
With [extract](#extract-export-data), the database schema (`.dacpac` file) is written to the local client running SqlPackage and the data is written to Azure Blob Storage in Parquet format. The data is stored in individual folders named with two-part table names. [CREATE EXTERNAL TABLE AS SELECT (CETAS)](../../t-sql/statements/create-external-table-as-select-transact-sql.md) is used to write the files in Azure Blob Storage.
2121

2222
With [publish](#publish-import-data), the database schema (`.dacpac` file) is read from the local client running SqlPackage and the data is read from or written to Azure Blob Storage in Parquet format.
2323

2424
In SQL databases hosted in Azure, the extract/publish operations with Parquet files offer improved performance over import/export operations with `.bacpac` files in many scenarios.
2525

26-
2726
## Extract (export data)
27+
2828
To export data from a database to Azure Blob Storage, the SqlPackage [extract](sqlpackage-extract.md) action is used with following properties:
29-
- /p:AzureStorageBlobEndpoint
30-
- /p:AzureStorageContainer
31-
- /p:AzureStorageKey or /p:AzureSharedAccessSignatureToken
29+
30+
- `/p:AzureStorageBlobEndpoint`
31+
- `/p:AzureStorageContainer`
32+
- `/p:AzureSharedAccessSignatureToken` or `/p:AzureStorageKey` (not supported for use with SQL Server)
33+
34+
:::image type="content" source="media/sqlpackage-with-data-in-parquet-files/data-extract.png" alt-text="Screenshot of Summary of data extract from a database with a .dacpac file written to a SqlPackage environment and the table data written to Azure Blob Storage in parquet files." lightbox="media/sqlpackage-with-data-in-parquet-files/data-extract.png":::
3235

3336
Access for the database to access the blob storage container is authorized via a storage account key. The database schema (.dacpac file) is written to the local client running SqlPackage and the data is written to Azure Blob Storage in Parquet format.
3437

3538
The parameter `/p:AzureStorageRootPath` is optional, which sets the storage root path within the container. Without this property, the path defaults to `servername/databasename/timestamp/`. Data is stored in individual folders named with two-part table names. The number of files created per table depends upon the MAXDOP and available SQL cores at the time of the export.
3639

37-
Finally, the property `/p:TableData` specifies which tables have their data exported. Specify the table name with or without the brackets surrounding the name parts in the format schema_name.table_identifier. This property may be specified multiple times to indicate multiple tables.
40+
Finally, the property `/p:TableData` specifies which tables have their data exported. Specify the table name with or without the brackets surrounding the name parts in the format schema_name.table_identifier. This property can be specified multiple times to indicate multiple tables.
3841

3942
### Example
4043

@@ -49,10 +52,11 @@ See [SqlPackage extract](sqlpackage-extract.md#examples) for more examples of au
4952
## Publish (import data)
5053

5154
To import data from Parquet files in Azure Blob Storage to a database, the SqlPackage [publish](sqlpackage-publish.md) action is used with the following properties:
52-
- /p:AzureStorageBlobEndpoint
53-
- /p:AzureStorageContainer
54-
- /p:AzureStorageRootPath
55-
- /p:AzureStorageKey or /p:AzureSharedAccessSignatureToken
55+
56+
- `/p:AzureStorageBlobEndpoint`
57+
- `/p:AzureStorageContainer`
58+
- `/p:AzureStorageRootPath`
59+
- `/p:AzureSharedAccessSignatureToken` or `/p:AzureStorageKey` (not supported for use with SQL Server)
5660

5761
Access for publish can be authorized via a storage account key or a shared access signature (SAS) token. The database schema (.dacpac file) is read from the local client running SqlPackage and the data is read from Azure Blob Storage in Parquet format.
5862

@@ -66,12 +70,11 @@ SqlPackage /Action:Publish /SourceFile:databaseschema.dacpac /TargetServerName:y
6670

6771
See [SqlPackage publish](sqlpackage-publish.md#examples) for more examples of authentication types available.
6872

69-
7073
## Limitations
7174

72-
### Polybase
75+
### PolyBase
7376

74-
[Polybase](../../relational-databases/polybase/polybase-guide.md) is required for SqlPackage operations with Parquet files. The following query can be used to check if Polybase is enabled:
77+
[PolyBase](../../relational-databases/polybase/polybase-guide.md) is required for SqlPackage operations with Parquet files. The following query can be used to check if PolyBase is enabled:
7578

7679
```sql
7780
// configuration_id = 16397 is 'allow polybase export'
@@ -80,17 +83,51 @@ SELECT configuration_id, value_in_use FROM sys.configurations
8083
WHERE configuration_id IN (16397, 16399)
8184
```
8285

83-
You may need to enable [Polybase](../../relational-databases/polybase/polybase-installation.md) or [Polybase export](../../database-engine/configure-windows/allow-polybase-export.md). Enabling Polybase on Azure SQL Managed Instance requires [PowerShell or Azure CLI](/sql/t-sql/statements/create-external-table-as-select-transact-sql?view=azuresqldb-mi-current&preserve-view=true#methods-to-enable-cetas). It's recommended that you evaluate whether enabling Polybase is right for your environment before making configuration changes.
86+
You might need to enable [PolyBase](../../relational-databases/polybase/polybase-installation.md) or [PolyBase export](../../database-engine/configure-windows/allow-polybase-export.md). Enabling PolyBase on Azure SQL Managed Instance requires [PowerShell or Azure CLI](/sql/t-sql/statements/create-external-table-as-select-transact-sql?view=azuresqldb-mi-current&preserve-view=true#methods-to-enable-cetas). You should evaluate whether enabling PolyBase is right for your environment before making configuration changes.
8487

8588
### Table and data types
8689

87-
Data types supported by [CETAS](../../t-sql/statements/create-external-table-as-select-transact-sql.md#supported-data-types) are supported for extract and publish operations with Parquet files.
90+
Most data types are supported for extract and publish operations with Parquet files. Tables with unsupported data types result in the table data for that table being exported to the `.dacpac` file instead of in Parquet format. The following data types are supported and are written to Parquet files in Azure Blob Storage:
91+
92+
- **char**
93+
- **varchar**
94+
- **nchar**
95+
- **nvarchar**
96+
- **text**
97+
- **ntext**
98+
- **decimal**
99+
- **numeric**
100+
- **float**
101+
- **real**
102+
- **bit**
103+
- **tinyint**
104+
- **smallint**
105+
- **int**
106+
- **bigint**
107+
- **smallmoney**
108+
- **money**
109+
- **smalldate**
110+
- **smalldatetime**
111+
- **date**
112+
- **datetime**
113+
- **datetime2**
114+
- **datetimeoffset**
115+
- **time**
116+
- **uniqueidentifier**
117+
- **timestamp**
118+
- **rowversion**
119+
- **binary**
120+
- **varbinary**
121+
- **image**
122+
- **xml**
123+
- **json**
124+
- **vector**
88125

89126
Ledger tables are enabled for extract and publish operations with Parquet files.
90127

91128
Data stored with Always Encrypted isn't supported for extract and publish operations with Parquet files.
92129

93-
Checking the database for unsupported types is done prior to extract to Parquet by SqlPackage, but you can examine your database quickly with T-SQL. The following sample query returns a result set of types and tables with types not supported for writing to Parquet files.
130+
You can examine your database with T-SQL to detect data types that would be written into the `.dacpac` file instead of in Parquet files written directly to Azure Blob Storage. The following sample query returns a result set of types and tables with types not supported for writing to Parquet files.
94131

95132
```sql
96133
SELECT DISTINCT C.DATA_TYPE, C.TABLE_SCHEMA, C.TABLE_NAME
@@ -117,22 +154,29 @@ WHERE C.DATA_TYPE NOT IN (
117154
'numeric',
118155
'float',
119156
'real',
120-
'bigint',
121157
'tinyint',
122158
'smallint',
123159
'int',
124160
'bigint',
125161
'bit',
126162
'money',
127163
'smallmoney',
128-
'uniqueidentifier'
129-
)
164+
'uniqueidentifier',
165+
'timestamp',
166+
'rowversion',
167+
'text',
168+
'ntext',
169+
'image',
170+
'xml',
171+
'json',
172+
'vector'
173+
);
130174
```
131175

132-
## Next Steps
176+
## Related content
133177

134-
- Learn more about [Extract](sqlpackage-extract.md)
135-
- Learn more about [Publish](sqlpackage-publish.md)
136-
- Learn more about [Azure Blob Storage](/azure/storage/blobs/storage-blobs-introduction)
137-
- Learn more about [Azure Storage shared access signature (SAS)](/azure/storage/common/storage-sas-overview)
138-
- Learn more about [Azure Storage Account Keys](/azure/storage/common/storage-account-keys-manage)
178+
- [SqlPackage Extract parameters and properties](sqlpackage-extract.md)
179+
- [SqlPackage Publish parameters, properties, and SQLCMD variables](sqlpackage-publish.md)
180+
- [Azure Blob Storage](/azure/storage/blobs/storage-blobs-introduction)
181+
- [Azure Storage shared access signature (SAS)](/azure/storage/common/storage-sas-overview)
182+
- [Azure Storage Account Keys](/azure/storage/common/storage-account-keys-manage)

0 commit comments

Comments
 (0)