You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/sql-data-warehouse/performance-tuning-ordered-cci.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,16 +11,17 @@ ms.date: 09/05/2019
11
11
ms.author: xiaoyul
12
12
ms.reviewer: nibruno; jrasnick
13
13
ms.custom: seo-lt-2019
14
+
ms.custom: azure-synapse
14
15
---
15
16
16
17
# Performance tuning with ordered clustered columnstore index
17
18
18
-
When users query a columnstore table in Azure SQL Data Warehouse, the optimizer checks the minimum and maximum values stored in each segment. Segments that are outside the bounds of the query predicate aren't read from disk to memory. A query can get faster performance if the number of segments to read and their total size are small.
19
+
When users query a columnstore table in SQL Analytics, the optimizer checks the minimum and maximum values stored in each segment. Segments that are outside the bounds of the query predicate aren't read from disk to memory. A query can get faster performance if the number of segments to read and their total size are small.
19
20
20
21
## Ordered vs. non-ordered clustered columnstore index
21
-
By default, for each Azure Data Warehouse table created without an index option, an internal component (index builder) creates a non-ordered clustered columnstore index (CCI) on it. Data in each column is compressed into a separate CCI rowgroup segment. There's metadata on each segment’s value range, so segments that are outside the bounds of the query predicate aren't read from disk during query execution. CCI offers the highest level of data compression and reduces the size of segments to read so queries can run faster. However, because the index builder doesn't sort data before compressing them into segments, segments with overlapping value ranges could occur, causing queries to read more segments from disk and take longer to finish.
22
+
By default, for each SQL Analytics table created without an index option, an internal component (index builder) creates a non-ordered clustered columnstore index (CCI) on it. Data in each column is compressed into a separate CCI rowgroup segment. There's metadata on each segment’s value range, so segments that are outside the bounds of the query predicate aren't read from disk during query execution. CCI offers the highest level of data compression and reduces the size of segments to read so queries can run faster. However, because the index builder doesn't sort data before compressing them into segments, segments with overlapping value ranges could occur, causing queries to read more segments from disk and take longer to finish.
22
23
23
-
When creating an ordered CCI, the Azure SQL Data Warehouse engine sorts the existing data in memory by the order key(s) before the index builder compresses them into index segments. With sorted data, segment overlapping is reduced allowing queries to have a more efficient segment elimination and thus faster performance because the number of segments to read from disk is smaller. If all data can be sorted in memory at once, then segment overlapping can be avoided. Given the large size of data in data warehouse tables, this scenario doesn't happen often.
24
+
When creating an ordered CCI, the SQL Analytics engine sorts the existing data in memory by the order key(s) before the index builder compresses them into index segments. With sorted data, segment overlapping is reduced allowing queries to have a more efficient segment elimination and thus faster performance because the number of segments to read from disk is smaller. If all data can be sorted in memory at once, then segment overlapping can be avoided. Given the large size of data in SQL Analytics tables, this scenario doesn't happen often.
24
25
25
26
To check the segment ranges for a column, run this command with your table name and column name:
26
27
@@ -39,7 +40,7 @@ ORDER BY o.name, pnp.distribution_id, cls.min_data_id
39
40
```
40
41
41
42
> [!NOTE]
42
-
> In an ordered CCI table, the new data resulting from the same batch of DML or data loading operations are sorted within that batch, there is no global sorting across all data in the table. Users can REBUILD the ordered CCI to sort all data in the table. In Azure SQL Data Warehouse, the columnstore index REBUILD is an offline operation. For a partitioned table, the REBUILD is done one partition at a time. Data in the partition that is being rebuilt is "offline" and unavailable until the REBUILD is complete for that partition.
43
+
> In an ordered CCI table, the new data resulting from the same batch of DML or data loading operations are sorted within that batch, there is no global sorting across all data in the table. Users can REBUILD the ordered CCI to sort all data in the table. In SQL Analytics, the columnstore index REBUILD is an offline operation. For a partitioned table, the REBUILD is done one partition at a time. Data in the partition that is being rebuilt is "offline" and unavailable until the REBUILD is complete for that partition.
43
44
44
45
## Query performance
45
46
@@ -105,7 +106,7 @@ CREATE TABLE Table1 WITH (DISTRIBUTION = HASH(c1), CLUSTERED COLUMNSTORE INDEX O
105
106
ASSELECT*FROM ExampleTable
106
107
OPTION (MAXDOP 1);
107
108
```
108
-
- Pre-sort the data by the sort key(s) before loading them into Azure SQL Data Warehouse tables.
109
+
- Pre-sort the data by the sort key(s) before loading them into SQL Analytics tables.
109
110
110
111
111
112
Here is an example of an ordered CCI table distribution that has zero segment overlapping following above recommendations. The ordered CCI table is created in a DWU1000c database via CTAS from a 20-GB heap table using MAXDOP 1 and xlargerc. The CCI is ordered on a BIGINT column with no duplicates.
@@ -140,4 +141,4 @@ WITH (DROP_EXISTING = ON)
140
141
```
141
142
142
143
## Next steps
143
-
For more development tips, see [SQL Data Warehouse development overview](sql-data-warehouse-overview-develop.md).
144
+
For more development tips, see [development overview](sql-data-warehouse-overview-develop.md).
Copy file name to clipboardExpand all lines: articles/sql-data-warehouse/sql-data-warehouse-table-constraints.md
+11-10Lines changed: 11 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Primary, foreign, and unique keys
3
-
description: Table constraints support in Azure SQL Data Warehouse
3
+
description: Table constraints support in SQL Analytics in Azure Synapse Analytics
4
4
services: sql-data-warehouse
5
5
author: XiaoyuMSFT
6
6
manager: craigg
@@ -11,23 +11,24 @@ ms.date: 09/05/2019
11
11
ms.author: xiaoyul
12
12
ms.reviewer: nibruno; jrasnick
13
13
ms.custom: seo-lt-2019
14
+
ms.custom: azure-synapse
14
15
---
15
16
16
-
# Primary key, foreign key, and unique key in Azure SQL Data Warehouse
17
+
# Primary key, foreign key, and unique key in SQL Analytics
17
18
18
-
Learn about table constraints in Azure SQL Data Warehouse, including primary key, foreign key, and unique key.
19
+
Learn about table constraints in SQL Analytics, including primary key, foreign key, and unique key.
19
20
20
21
## Table constraints
21
-
Azure SQL Data Warehouse supports these table constraints:
22
+
SQL Analytics supports these table constraints:
22
23
- PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are both used.
23
24
- UNIQUE constraint is only supported with NOT ENFORCED is used.
24
25
25
-
FOREIGN KEY constraint is not supported in Azure SQL Data Warehouse.
26
+
FOREIGN KEY constraint is not supported in SQL Analytics.
26
27
27
28
## Remarks
28
-
Having primary key and/or unique key allows data warehouse engine to generate an optimal execution plan for a query. All values in a primary key column or a unique constraint column should be unique.
29
+
Having primary key and/or unique key allows SQL Analytics engine to generate an optimal execution plan for a query. All values in a primary key column or a unique constraint column should be unique.
29
30
30
-
After creating a table with primary key or unique constraint in Azure data warehouse, users need to make sure all values in those columns are unique. A violation of that may cause the query to return inaccurate result. This example shows how a query may return inaccurate result if the primary key or unique constraint column includes duplicate values.
31
+
After creating a table with primary key or unique constraint in SQL Analytics, users need to make sure all values in those columns are unique. A violation of that may cause the query to return inaccurate result. This example shows how a query may return inaccurate result if the primary key or unique constraint column includes duplicate values.
31
32
32
33
```sql
33
34
-- Create table t1
@@ -153,17 +154,17 @@ a1 total
153
154
```
154
155
155
156
## Examples
156
-
Create a data warehouse table with a primary key:
157
+
Create a SQL Analytics table with a primary key:
157
158
158
159
```sql
159
160
CREATETABLEmytable (c1 INTPRIMARY KEY NONCLUSTERED NOT ENFORCED, c2 INT);
160
161
```
161
-
Create a data warehouse table with a unique constraint:
162
+
Create a SQL Analytics table with a unique constraint:
162
163
163
164
```sql
164
165
CREATETABLEt6 (c1 INT UNIQUE NOT ENFORCED, c2 INT);
165
166
```
166
167
167
168
## Next steps
168
169
169
-
After creating the tables for your data warehouse, the next step is to load data into the table. For a loading tutorial, see [Loading data to SQL Data Warehouse](load-data-wideworldimportersdw.md).
170
+
After creating the tables for your SQL Analytics database, the next step is to load data into the table. For a loading tutorial, see [Loading data to SQL Analytics databases](load-data-wideworldimportersdw.md).
Copy file name to clipboardExpand all lines: articles/sql-data-warehouse/sql-data-warehouse-tables-distribute.md
+11-10Lines changed: 11 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Distributed tables design guidance
3
-
description: Recommendations for designing hash-distributed and round-robin distributed tables in Azure SQL Data Warehouse.
3
+
description: Recommendations for designing hash-distributed and round-robin distributed tables in SQL Analytics.
4
4
services: sql-data-warehouse
5
5
author: XiaoyuMSFT
6
6
manager: craigg
@@ -11,12 +11,13 @@ ms.date: 04/17/2018
11
11
ms.author: xiaoyul
12
12
ms.reviewer: igorstan
13
13
ms.custom: seo-lt-2019
14
+
ms.custom: azure-synapse
14
15
---
15
16
16
-
# Guidance for designing distributed tables in Azure SQL Data Warehouse
17
-
Recommendations for designing hash-distributed and round-robin distributed tables in Azure SQL Data Warehouse.
17
+
# Guidance for designing distributed tables in SQL Analytics
18
+
Recommendations for designing hash-distributed and round-robin distributed tables in SQL Analytics.
18
19
19
-
This article assumes you are familiar with data distribution and data movement concepts in SQL Data Warehouse. For more information, see [Azure SQL Data Warehouse - Massively Parallel Processing (MPP) architecture](massively-parallel-processing-mpp-architecture.md).
20
+
This article assumes you are familiar with data distribution and data movement concepts in SQL Analytics. For more information, see [SQL Analytics massively parallel processing (MPP) architecture](massively-parallel-processing-mpp-architecture.md).
20
21
21
22
## What is a distributed table?
22
23
A distributed table appears as a single table, but the rows are actually stored across 60 distributions. The rows are distributed with a hash or round-robin algorithm.
@@ -29,15 +30,15 @@ As part of table design, understand as much as possible about your data and how
29
30
30
31
- How large is the table?
31
32
- How often is the table refreshed?
32
-
- Do I have fact and dimension tables in a data warehouse?
33
+
- Do I have fact and dimension tables in a SQL Analytics database?
33
34
34
35
35
36
### Hash distributed
36
37
A hash-distributed table distributes table rows across the Compute nodes by using a deterministic hash function to assign each row to one [distribution](massively-parallel-processing-mpp-architecture.md#distributions).
Since identical values always hash to the same distribution, the data warehouse has built-in knowledge of the row locations. SQL Data Warehouse uses this knowledge to minimize data movement during queries, which improves query performance.
41
+
Since identical values always hash to the same distribution, the SQL Analytics has built-in knowledge of the row locations. SQL Analytics uses this knowledge to minimize data movement during queries, which improves query performance.
41
42
42
43
Hash-distributed tables work well for large fact tables in a star schema. They can have very large numbers of rows and still achieve high performance. There are, of course, some design considerations that help you to get the performance the distributed system is designed to provide. Choosing a good distribution column is one such consideration that is described in this article.
43
44
@@ -60,7 +61,7 @@ Consider using the round-robin distribution for your table in the following scen
60
61
- If the join is less significant than other joins in the query
61
62
- When the table is a temporary staging table
62
63
63
-
The tutorial [Load New York taxicab data to Azure SQL Data Warehouse](load-data-from-azure-blob-storage-using-polybase.md#load-the-data-into-your-data-warehouse) gives an example of loading data into a round-robin staging table.
64
+
The tutorial [Load New York taxicab data](load-data-from-azure-blob-storage-using-polybase.md#load-the-data-into-your-data-warehouse) gives an example of loading data into a round-robin staging table in SQL Analytics.
64
65
65
66
66
67
## Choosing a distribution column
@@ -104,7 +105,7 @@ To balance the parallel processing, select a distribution column that:
104
105
105
106
### Choose a distribution column that minimizes data movement
106
107
107
-
To get the correct query result queries might move data from one Compute node to another. Data movement commonly happens when queries have joins and aggregations on distributed tables. Choosing a distribution column that helps minimize data movement is one of the most important strategies for optimizing performance of your SQL Data Warehouse.
108
+
To get the correct query result queries might move data from one Compute node to another. Data movement commonly happens when queries have joins and aggregations on distributed tables. Choosing a distribution column that helps minimize data movement is one of the most important strategies for optimizing performance of your SQL Analytics database.
108
109
109
110
To minimize data movement, select a distribution column that:
110
111
@@ -212,7 +213,7 @@ RENAME OBJECT [dbo].[FactInternetSales_CustomerKey] TO [FactInternetSales];
212
213
213
214
To create a distributed table, use one of these statements:
214
215
215
-
-[CREATE TABLE (Azure SQL Data Warehouse)](https://docs.microsoft.com/sql/t-sql/statements/create-table-azure-sql-data-warehouse)
216
-
-[CREATE TABLE AS SELECT (Azure SQL Data Warehouse](https://docs.microsoft.com/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse)
Copy file name to clipboardExpand all lines: articles/sql-data-warehouse/sql-data-warehouse-tables-identity.md
+10-9Lines changed: 10 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Using IDENTITY to create surrogate keys
3
-
description: Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in Azure SQL Data Warehouse.
3
+
description: Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in SQL Analytics.
4
4
services: sql-data-warehouse
5
5
author: XiaoyuMSFT
6
6
manager: craigg
@@ -11,19 +11,20 @@ ms.date: 04/30/2019
11
11
ms.author: xiaoyul
12
12
ms.reviewer: igorstan
13
13
ms.custom: seo-lt-2019
14
+
ms.custom: azure-synapse
14
15
---
15
16
16
-
# Using IDENTITY to create surrogate keys in Azure SQL Data Warehouse
17
+
# Using IDENTITY to create surrogate keys in SQL Analytics
17
18
18
-
Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in Azure SQL Data Warehouse.
19
+
Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in SQL Analytics.
19
20
20
21
## What is a surrogate key
21
22
22
-
A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.
23
+
A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design SQL Analytics models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.
23
24
24
25
## Creating a table with an IDENTITY column
25
26
26
-
The IDENTITY property is designed to scale out across all the distributions in the data warehouse without affecting load performance. Therefore, the implementation of IDENTITY is oriented toward achieving these goals.
27
+
The IDENTITY property is designed to scale out across all the distributions in the SQL Analytics database without affecting load performance. Therefore, the implementation of IDENTITY is oriented toward achieving these goals.
27
28
28
29
You can define a table as having the IDENTITY property when you first create the table by using syntax that is similar to the following statement:
29
30
@@ -45,7 +46,7 @@ This remainder of this section highlights the nuances of the implementation to h
45
46
46
47
### Allocation of values
47
48
48
-
The IDENTITY property doesn't guarantee the order in which the surrogate values are allocated, which reflects the behavior of SQL Server and Azure SQL Database. However, in Azure SQL Data Warehouse, the absence of a guarantee is more pronounced.
49
+
The IDENTITY property doesn't guarantee the order in which the surrogate values are allocated, which reflects the behavior of SQL Server and Azure SQL Database. However, in SQL Analytics, the absence of a guarantee is more pronounced.
49
50
50
51
The following example is an illustration:
51
52
@@ -95,7 +96,7 @@ CREATE TABLE AS SELECT (CTAS) follows the same SQL Server behavior that's docume
95
96
96
97
## Explicitly inserting values into an IDENTITY column
97
98
98
-
SQL Data Warehouse supports `SET IDENTITY_INSERT <your table> ON|OFF` syntax. You can use this syntax to explicitly insert values into the IDENTITY column.
99
+
SQL Analytics supports `SET IDENTITY_INSERT <your table> ON|OFF` syntax. You can use this syntax to explicitly insert values into the IDENTITY column.
99
100
100
101
Many data modelers like to use predefined negative values for certain rows in their dimensions. An example is the -1 or "unknown member" row.
> It's not possible to use `CREATE TABLE AS SELECT` currently when loading data into a table with an IDENTITY column.
157
158
>
158
159
159
-
For more information on loading data, see [Designing Extract, Load, and Transform (ELT) for Azure SQL Data Warehouse](design-elt-data-loading.md) and [Loading best practices](guidance-for-loading-data.md).
160
+
For more information on loading data, see [Designing Extract, Load, and Transform (ELT) for SQL Analytics](design-elt-data-loading.md) and [Loading best practices](guidance-for-loading-data.md).
160
161
161
162
## System views
162
163
@@ -190,7 +191,7 @@ The IDENTITY property can't be used:
190
191
- When the column is also the distribution key
191
192
- When the table is an external table
192
193
193
-
The following related functions are not supported in SQL Data Warehouse:
194
+
The following related functions are not supported in SQL Analytics:
0 commit comments