You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md
+20-4Lines changed: 20 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,10 @@ To minimize latency, colocate your Azure Storage account or Azure Cosmos DB anal
45
45
46
46
For optimal performance, if you access other storage accounts with serverless SQL pool, make sure they're in the same region. If they aren't in the same region, there will be increased latency for the data's network transfer between the remote region and the endpoint's region.
47
47
48
+
### Colocate your Azure Cosmos DB analytical storage and serverless SQL pool
49
+
50
+
Make sure your Azure Cosmos DB analytical storage is placed in the same region as an Azure Synapse workspace. Cross-region queries might cause huge latencies. Use the region property in the connection string to explicitly specify the region where the analytical store is placed (see [Query Azure Cosmos DB by using serverless SQL pool](query-cosmos-db-analytical-store.md#overview)): `account=<database account name>;database=<database name>;region=<region name>'`
51
+
48
52
### Azure Storage throttling
49
53
50
54
Multiple applications and services might access your storage account. Storage throttling occurs when the combined IOPS or throughput generated by applications, services, and serverless SQL pool workloads exceeds the limits of the storage account. As a result, you'll experience a significant negative effect on query performance.
@@ -64,10 +68,6 @@ If possible, you can prepare files for better performance:
64
68
- It's better to have equally sized files for a single OPENROWSET path or an external table LOCATION.
65
69
- Partition your data by storing partitions to different folders or file names. See [Use filename and filepath functions to target specific partitions](#use-filename-and-filepath-functions-to-target-specific-partitions).
66
70
67
-
### Colocate your Azure Cosmos DB analytical storage and serverless SQL pool
68
-
69
-
Make sure your Azure Cosmos DB analytical storage is placed in the same region as an Azure Synapse workspace. Cross-region queries might cause huge latencies. Use the region property in the connection string to explicitly specify the region where the analytical store is placed (see [Query Azure Cosmos DB by using serverless SQL pool](query-cosmos-db-analytical-store.md#overview)): `account=<database account name>;database=<database name>;region=<region name>'`
70
-
71
71
## CSV optimizations
72
72
73
73
Here are best practices for using CSV files in serverless SQL pool.
@@ -80,6 +80,22 @@ You can use a performance-optimized parser when you query CSV files. For details
80
80
81
81
Serverless SQL pool relies on statistics to generate optimal query execution plans. Statistics are automatically created for columns using sampling and in most cases sampling percentage will be less than 100%. This flow is the same for every file format. Have in mind that when reading CSV with parser version 1.0 sampling isn't supported and automatic creation of statistics won't happen with sampling percentage less than 100%. For small tables with estimated low cardinality (number of rows) automatic statistics creation will be triggered with sampling percentage of 100%. That means that fullscan is triggered and automatic statistics are created even for CSV with parser version 1.0. In case statistics aren't automatically created, create statistics manually for columns that you use in queries, particularly those used in DISTINCT, JOIN, WHERE, ORDER BY, and GROUP BY. Check [statistics in serverless SQL pool](develop-tables-statistics.md#statistics-in-serverless-sql-pool) for details.
82
82
83
+
## Delta Lake optimizations
84
+
85
+
Here are best practices for using Delta Lake files in serverless SQL pool.
86
+
87
+
### Optimize checkpoints
88
+
89
+
Query performance of Delta Lake format is influenced by the number of JSON files in the _delta_log directory. To ensure optimal performance, avoid accumulating too many JSON files. Ideally, the log should contain only the latest Parquet checkpoint file with no additional JSON files. However, this setup may not be optimal for write-heavy workloads.
90
+
91
+
A balanced approach is to maintain around 10 JSON files between checkpoints, which typically offers good performance for both readers and writers. Be cautious of configurations that delay checkpoint creation, as they can lead to excessive JSON file accumulation and degrade query performance.
92
+
93
+
Set the following table property to ensure a checkpoint is created after every 10 JSON log files:
94
+
95
+
```sql
96
+
ALTERTABLE tableName SET TBLPROPERTIES ('delta.checkpointInterval'='10')
97
+
```
98
+
83
99
## Data types
84
100
85
101
Here are best practices for using data types in serverless SQL pool.
0 commit comments