You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql-data-warehouse/analyze-your-workload.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,11 +15,11 @@ ms.custom: azure-synapse
15
15
16
16
# Analyze your workload in Azure Synapse Analytics
17
17
18
-
Techniques for analyzing your SQL Analytics workload in Azure Synapse Analytics.
18
+
Techniques for analyzing your Synapse SQL workload in Azure Synapse Analytics.
19
19
20
20
## Resource Classes
21
21
22
-
SQL Analytics provides resource classes to assign system resources to queries. For more information on resource classes, see [Resource classes & workload management](resource-classes-for-workload-management.md). Queries will wait if the resource class assigned to a query needs more resources than are currently available.
22
+
Synapse SQL provides resource classes to assign system resources to queries. For more information on resource classes, see [Resource classes & workload management](resource-classes-for-workload-management.md). Queries will wait if the resource class assigned to a query needs more resources than are currently available.
23
23
24
24
## Queued query detection and other DMVs
25
25
@@ -58,7 +58,7 @@ WHERE r.name IN ('mediumrc','largerc','xlargerc')
58
58
;
59
59
```
60
60
61
-
SQL Analytics has the following wait types:
61
+
Synapse SQL has the following wait types:
62
62
63
63
***LocalQueriesConcurrencyResourceType**: Queries that sit outside of the concurrency slot framework. DMV queries and system functions such as `SELECT @@VERSION` are examples of local queries.
64
64
***UserConcurrencyResourceType**: Queries that sit inside the concurrency slot framework. Queries against end-user tables represent examples that would use this resource type.
@@ -148,4 +148,4 @@ FROM sys.dm_pdw_wait_stats w
148
148
149
149
## Next steps
150
150
151
-
For more information about managing database users and security, see [Secure a database in SQL Analytics](sql-data-warehouse-overview-manage-security.md). For more information about how larger resource classes can improve clustered columnstore index quality, see [Rebuilding indexes to improve segment quality](sql-data-warehouse-tables-index.md#rebuilding-indexes-to-improve-segment-quality).
151
+
For more information about managing database users and security, see [Secure a database in Synapse SQL](sql-data-warehouse-overview-manage-security.md). For more information about how larger resource classes can improve clustered columnstore index quality, see [Rebuilding indexes to improve segment quality](sql-data-warehouse-tables-index.md#rebuilding-indexes-to-improve-segment-quality).
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables.md
+29-22Lines changed: 29 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Design guidance for replicated tables
3
-
description: Recommendations for designing replicated tables in SQL Analytics
3
+
description: Recommendations for designing replicated tables in Synapse SQL
4
4
services: synapse-analytics
5
5
author: XiaoyuMSFT
6
6
manager: craigg
@@ -13,41 +13,45 @@ ms.reviewer: igorstan
13
13
ms.custom: seo-lt-2019, azure-synapse
14
14
---
15
15
16
-
# Design guidance for using replicated tables in SQL Analytics
17
-
This article gives recommendations for designing replicated tables in your SQL Analytics schema. Use these recommendations to improve query performance by reducing data movement and query complexity.
16
+
# Design guidance for using replicated tables in Synapse SQL
17
+
18
+
This article gives recommendations for designing replicated tables in your Synapse SQL schema. Use these recommendations to improve query performance by reducing data movement and query complexity.
This article assumes you are familiar with data distribution and data movement concepts in SQL Analytics. For more information, see the [architecture](massively-parallel-processing-mpp-architecture.md) article.
23
+
24
+
This article assumes you are familiar with data distribution and data movement concepts in Synapse SQL. For more information, see the [architecture](massively-parallel-processing-mpp-architecture.md) article.
23
25
24
26
As part of table design, understand as much as possible about your data and how the data is queried. For example, consider these questions:
25
27
26
28
- How large is the table?
27
29
- How often is the table refreshed?
28
-
- Do I have fact and dimension tables in a SQL Analytics database?
30
+
- Do I have fact and dimension tables in a Synapse SQL database?
29
31
30
32
## What is a replicated table?
33
+
31
34
A replicated table has a full copy of the table accessible on each Compute node. Replicating a table removes the need to transfer data among Compute nodes before a join or aggregation. Since the table has multiple copies, replicated tables work best when the table size is less than 2 GB compressed. 2 GB is not a hard limit. If the data is static and does not change, you can replicate larger tables.
32
35
33
-
The following diagram shows a replicated table that is accessible on each Compute node. In SQL Analytics, the replicated table is fully copied to a distribution database on each Compute node.
36
+
The following diagram shows a replicated table that is accessible on each Compute node. In Synapse SQL, the replicated table is fully copied to a distribution database on each compute node.
Replicated tables work well for dimension tables in a star schema. Dimension tables are typically joined to fact tables which are distributed differently than the dimension table. Dimensions are usually of a size that makes it feasible to store and maintain multiple copies. Dimensions store descriptive data that changes slowly, such as customer name and address, and product details. The slowly changing nature of the data leads to less maintenance of the replicated table.
38
41
39
42
Consider using a replicated table when:
40
43
41
-
- The table size on disk is less than 2 GB, regardless of the number of rows. To find the size of a table, you can use the [DBCC PDW_SHOWSPACEUSED](https://docs.microsoft.com/sql/t-sql/database-console-commands/dbcc-pdw-showspaceused-transact-sql) command: `DBCC PDW_SHOWSPACEUSED('ReplTableCandidate')`.
42
-
- The table is used in joins that would otherwise require data movement. When joining tables that are not distributed on the same column, such as a hash-distributed table to a round-robin table, data movement is required to complete the query. If one of the tables is small, consider a replicated table. We recommend using replicated tables instead of round-robin tables in most cases. To view data movement operations in query plans, use [sys.dm_pdw_request_steps](https://docs.microsoft.com/sql/relational-databases/system-dynamic-management-views/sys-dm-pdw-request-steps-transact-sql). The BroadcastMoveOperation is the typical data movement operation that can be eliminated by using a replicated table.
44
+
- The table size on disk is less than 2 GB, regardless of the number of rows. To find the size of a table, you can use the [DBCC PDW_SHOWSPACEUSED](/sql/t-sql/database-console-commands/dbcc-pdw-showspaceused-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest) command: `DBCC PDW_SHOWSPACEUSED('ReplTableCandidate')`.
45
+
- The table is used in joins that would otherwise require data movement. When joining tables that are not distributed on the same column, such as a hash-distributed table to a round-robin table, data movement is required to complete the query. If one of the tables is small, consider a replicated table. We recommend using replicated tables instead of round-robin tables in most cases. To view data movement operations in query plans, use [sys.dm_pdw_request_steps](/sql/relational-databases/system-dynamic-management-views/sys-dm-pdw-request-steps-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest). The BroadcastMoveOperation is the typical data movement operation that can be eliminated by using a replicated table.
43
46
44
47
Replicated tables may not yield the best query performance when:
45
48
46
49
- The table has frequent insert, update, and delete operations. The data manipulation language (DML) operations require a rebuild of the replicated table. Rebuilding frequently can cause slower performance.
47
-
- The SQL Analytics database is scaled frequently. Scaling a SQL Analytics database changes the number of Compute nodes, which incurs rebuilding the replicated table.
48
-
- The table has a large number of columns, but data operations typically access only a small number of columns. In this scenario, instead of replicating the entire table, it might be more effective to distribute the table, and then create an index on the frequently accessed columns. When a query requires data movement, SQL Analytics only moves data for the requested columns.
50
+
- The Synapse SQL database is scaled frequently. Scaling a database changes the number of compute nodes, which incurs rebuilding the replicated table.
51
+
- The table has a large number of columns, but data operations typically access only a small number of columns. In this scenario, instead of replicating the entire table, it might be more effective to distribute the table, and then create an index on the frequently accessed columns. When a query requires data movement, only the data for the requested columns is moved.
49
52
50
53
## Use replicated tables with simple query predicates
54
+
51
55
Before you choose to distribute or replicate a table, think about the types of queries you plan to run against the table. Whenever possible,
52
56
53
57
- Use replicated tables for queries with simple query predicates, such as equality or inequality.
@@ -68,7 +72,7 @@ WHERE EnglishDescription LIKE '%frame%comfortable%'
68
72
## Convert existing round-robin tables to replicated tables
69
73
If you already have round-robin tables, we recommend converting them to replicated tables if they meet the criteria outlined in this article. Replicated tables improve performance over round-robin tables because they eliminate the need for data movement. A round-robin table always requires data movement for joins.
70
74
71
-
This example uses [CTAS](/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse) to change the DimSalesTerritory table to a replicated table. This example works regardless of whether DimSalesTerritory is hash-distributed or round-robin.
75
+
This example uses [CTAS](/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest) to change the DimSalesTerritory table to a replicated table. This example works regardless of whether DimSalesTerritory is hash-distributed or round-robin.
72
76
73
77
```sql
74
78
CREATE TABLE [dbo].[DimSalesTerritory_REPLICATE]
@@ -89,7 +93,7 @@ DROP TABLE [dbo].[DimSalesTerritory_old];
89
93
90
94
### Query performance example for round-robin versus replicated
91
95
92
-
A replicated table does not require any data movement for joins because the entire table is already present on each Compute node. If the dimension tables are round-robin distributed, a join copies the dimension table in full to each Compute node. To move the data, the query plan contains an operation called BroadcastMoveOperation. This type of data movement operation slows query performance and is eliminated by using replicated tables. To view query plan steps, use the [sys.dm_pdw_request_steps](/sql/relational-databases/system-dynamic-management-views/sys-dm-pdw-request-steps-transact-sql) system catalog view.
96
+
A replicated table does not require any data movement for joins because the entire table is already present on each Compute node. If the dimension tables are round-robin distributed, a join copies the dimension table in full to each Compute node. To move the data, the query plan contains an operation called BroadcastMoveOperation. This type of data movement operation slows query performance and is eliminated by using replicated tables. To view query plan steps, use the [sys.dm_pdw_request_steps](/sql/relational-databases/system-dynamic-management-views/sys-dm-pdw-request-steps-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest) system catalog view.
93
97
94
98
For example, in following query against the AdventureWorks schema, the `FactInternetSales` table is hash-distributed. The `DimDate` and `DimSalesTerritory` tables are smaller dimension tables. This query returns the total sales in North America for fiscal year 2004:
95
99
@@ -113,11 +117,12 @@ We re-created `DimDate` and `DimSalesTerritory` as replicated tables, and ran th
113
117
114
118
115
119
## Performance considerations for modifying replicated tables
116
-
SQL Analytics implements a replicated table by maintaining a master version of the table. It copies the master version to one distribution database on each Compute node. When there is a change, SQL Analytics first updates the master table. Then it rebuilds the tables on each Compute node. A rebuild of a replicated table includes copying the table to each Compute node and then building the indexes. For example, a replicated table on a DW400 has 5 copies of the data. A master copy and a full copy on each Compute node. All data is stored in distribution databases. SQL Analytics uses this model to support faster data modification statements and flexible scaling operations.
120
+
121
+
A replicated table is implemented by maintaining a master version of the table. It copies the master version to one distribution database on each Compute node. When there is a change, the master table is updated first. Then the table on each Compute node is rebuilt. A rebuild of a replicated table includes copying the table to each Compute node and then building the indexes. For example, a replicated table on a DW400 has 5 copies of the data. A master copy and a full copy on each Compute node. All data is stored in distribution databases to support faster data modification statements and flexible scaling operations.
117
122
118
123
Rebuilds are required after:
119
124
- Data is loaded or modified
120
-
- The SQL Analytics instance is scaled to a different level
125
+
- The Synapse SQL instance is scaled to a different level
121
126
- Table definition is updated
122
127
123
128
Rebuilds are not required after:
@@ -127,7 +132,8 @@ Rebuilds are not required after:
127
132
The rebuild does not happen immediately after data is modified. Instead, the rebuild is triggered the first time a query selects from the table. The query that triggered the rebuild reads immediately from the master version of the table while the data is asynchronously copied to each Compute node. Until the data copy is complete, subsequent queries will continue to use the master version of the table. If any activity happens against the replicated table that forces another rebuild, the data copy is invalidated and the next select statement will trigger data to be copied again.
128
133
129
134
### Use indexes conservatively
130
-
Standard indexing practices apply to replicated tables. SQL Analytics rebuilds each replicated table index as part of the rebuild. Only use indexes when the performance gain outweighs the cost of rebuilding the indexes.
135
+
136
+
Standard indexing practices apply to replicated tables. Each replicated table index is rebuilt as part of an index rebuild. Only use indexes when the performance gain outweighs the cost of rebuilding the indexes.
131
137
132
138
### Batch data loads
133
139
When loading data into replicated tables, try to minimize rebuilds by batching loads together. Perform all the batched loads before running select statements.
@@ -151,11 +157,11 @@ For example, this load pattern loads data from four sources, but only invokes on
151
157
- Load from source 4.
152
158
- Select statement triggers rebuild.
153
159
154
-
155
160
### Rebuild a replicated table after a batch load
161
+
156
162
To ensure consistent query execution times, consider forcing the build of the replicated tables after a batch load. Otherwise, the first query will still use data movement to complete the query.
157
163
158
-
This query uses the [sys.pdw_replicated_table_cache_state](/sql/relational-databases/system-catalog-views/sys-pdw-replicated-table-cache-state-transact-sql) DMV to list the replicated tables that have been modified, but not rebuilt.
164
+
This query uses the [sys.pdw_replicated_table_cache_state](/sql/relational-databases/system-catalog-views/sys-pdw-replicated-table-cache-state-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest) DMV to list the replicated tables that have been modified, but not rebuilt.
159
165
160
166
```sql
161
167
SELECT [ReplicatedTable] = t.[name]
@@ -172,12 +178,13 @@ To trigger a rebuild, run the following statement on each table in the preceding
172
178
173
179
```sql
174
180
SELECT TOP 1*FROM [ReplicatedTable]
175
-
```
176
-
177
-
## Next steps
181
+
```
182
+
183
+
## Next steps
184
+
178
185
To create a replicated table, use one of these statements:
-[CREATE TABLE AS SELECT ](/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest)
182
189
183
190
For an overview of distributed tables, see [distributed tables](sql-data-warehouse-tables-distribute.md).
0 commit comments