Skip to content

Commit cce4ad6

Browse files
committed
update 7 articles
1 parent 25294fb commit cce4ad6

7 files changed

+77
-70
lines changed

articles/sql-data-warehouse/performance-tuning-ordered-cci.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,17 @@ ms.date: 09/05/2019
1111
ms.author: xiaoyul
1212
ms.reviewer: nibruno; jrasnick
1313
ms.custom: seo-lt-2019
14+
ms.custom: azure-synapse
1415
---
1516

1617
# Performance tuning with ordered clustered columnstore index
1718

18-
When users query a columnstore table in Azure SQL Data Warehouse, the optimizer checks the minimum and maximum values stored in each segment. Segments that are outside the bounds of the query predicate aren't read from disk to memory. A query can get faster performance if the number of segments to read and their total size are small.
19+
When users query a columnstore table in SQL Analytics, the optimizer checks the minimum and maximum values stored in each segment. Segments that are outside the bounds of the query predicate aren't read from disk to memory. A query can get faster performance if the number of segments to read and their total size are small.
1920

2021
## Ordered vs. non-ordered clustered columnstore index
21-
By default, for each Azure Data Warehouse table created without an index option, an internal component (index builder) creates a non-ordered clustered columnstore index (CCI) on it. Data in each column is compressed into a separate CCI rowgroup segment. There's metadata on each segment’s value range, so segments that are outside the bounds of the query predicate aren't read from disk during query execution. CCI offers the highest level of data compression and reduces the size of segments to read so queries can run faster. However, because the index builder doesn't sort data before compressing them into segments, segments with overlapping value ranges could occur, causing queries to read more segments from disk and take longer to finish.
22+
By default, for each SQL Analytics table created without an index option, an internal component (index builder) creates a non-ordered clustered columnstore index (CCI) on it. Data in each column is compressed into a separate CCI rowgroup segment. There's metadata on each segment’s value range, so segments that are outside the bounds of the query predicate aren't read from disk during query execution. CCI offers the highest level of data compression and reduces the size of segments to read so queries can run faster. However, because the index builder doesn't sort data before compressing them into segments, segments with overlapping value ranges could occur, causing queries to read more segments from disk and take longer to finish.
2223

23-
When creating an ordered CCI, the Azure SQL Data Warehouse engine sorts the existing data in memory by the order key(s) before the index builder compresses them into index segments. With sorted data, segment overlapping is reduced allowing queries to have a more efficient segment elimination and thus faster performance because the number of segments to read from disk is smaller. If all data can be sorted in memory at once, then segment overlapping can be avoided. Given the large size of data in data warehouse tables, this scenario doesn't happen often.
24+
When creating an ordered CCI, the SQL Analytics engine sorts the existing data in memory by the order key(s) before the index builder compresses them into index segments. With sorted data, segment overlapping is reduced allowing queries to have a more efficient segment elimination and thus faster performance because the number of segments to read from disk is smaller. If all data can be sorted in memory at once, then segment overlapping can be avoided. Given the large size of data in SQL Analytics tables, this scenario doesn't happen often.
2425

2526
To check the segment ranges for a column, run this command with your table name and column name:
2627

@@ -39,7 +40,7 @@ ORDER BY o.name, pnp.distribution_id, cls.min_data_id
3940
```
4041

4142
> [!NOTE]
42-
> In an ordered CCI table, the new data resulting from the same batch of DML or data loading operations are sorted within that batch, there is no global sorting across all data in the table. Users can REBUILD the ordered CCI to sort all data in the table. In Azure SQL Data Warehouse, the columnstore index REBUILD is an offline operation. For a partitioned table, the REBUILD is done one partition at a time. Data in the partition that is being rebuilt is "offline" and unavailable until the REBUILD is complete for that partition.
43+
> In an ordered CCI table, the new data resulting from the same batch of DML or data loading operations are sorted within that batch, there is no global sorting across all data in the table. Users can REBUILD the ordered CCI to sort all data in the table. In SQL Analytics, the columnstore index REBUILD is an offline operation. For a partitioned table, the REBUILD is done one partition at a time. Data in the partition that is being rebuilt is "offline" and unavailable until the REBUILD is complete for that partition.
4344
4445
## Query performance
4546

@@ -105,7 +106,7 @@ CREATE TABLE Table1 WITH (DISTRIBUTION = HASH(c1), CLUSTERED COLUMNSTORE INDEX O
105106
AS SELECT * FROM ExampleTable
106107
OPTION (MAXDOP 1);
107108
```
108-
- Pre-sort the data by the sort key(s) before loading them into Azure SQL Data Warehouse tables.
109+
- Pre-sort the data by the sort key(s) before loading them into SQL Analytics tables.
109110

110111

111112
Here is an example of an ordered CCI table distribution that has zero segment overlapping following above recommendations. The ordered CCI table is created in a DWU1000c database via CTAS from a 20-GB heap table using MAXDOP 1 and xlargerc. The CCI is ordered on a BIGINT column with no duplicates.
@@ -140,4 +141,4 @@ WITH (DROP_EXISTING = ON)
140141
```
141142

142143
## Next steps
143-
For more development tips, see [SQL Data Warehouse development overview](sql-data-warehouse-overview-develop.md).
144+
For more development tips, see [development overview](sql-data-warehouse-overview-develop.md).

articles/sql-data-warehouse/sql-data-warehouse-table-constraints.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Primary, foreign, and unique keys
3-
description: Table constraints support in Azure SQL Data Warehouse
3+
description: Table constraints support in SQL Analytics in Azure Synapse Analytics
44
services: sql-data-warehouse
55
author: XiaoyuMSFT
66
manager: craigg
@@ -11,23 +11,24 @@ ms.date: 09/05/2019
1111
ms.author: xiaoyul
1212
ms.reviewer: nibruno; jrasnick
1313
ms.custom: seo-lt-2019
14+
ms.custom: azure-synapse
1415
---
1516

16-
# Primary key, foreign key, and unique key in Azure SQL Data Warehouse
17+
# Primary key, foreign key, and unique key in SQL Analytics
1718

18-
Learn about table constraints in Azure SQL Data Warehouse, including primary key, foreign key, and unique key.
19+
Learn about table constraints in SQL Analytics, including primary key, foreign key, and unique key.
1920

2021
## Table constraints
21-
Azure SQL Data Warehouse supports these table constraints:
22+
SQL Analytics supports these table constraints:
2223
- PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are both used.
2324
- UNIQUE constraint is only supported with NOT ENFORCED is used.
2425

25-
FOREIGN KEY constraint is not supported in Azure SQL Data Warehouse.
26+
FOREIGN KEY constraint is not supported in SQL Analytics.
2627

2728
## Remarks
28-
Having primary key and/or unique key allows data warehouse engine to generate an optimal execution plan for a query. All values in a primary key column or a unique constraint column should be unique.
29+
Having primary key and/or unique key allows SQL Analytics engine to generate an optimal execution plan for a query. All values in a primary key column or a unique constraint column should be unique.
2930

30-
After creating a table with primary key or unique constraint in Azure data warehouse, users need to make sure all values in those columns are unique. A violation of that may cause the query to return inaccurate result. This example shows how a query may return inaccurate result if the primary key or unique constraint column includes duplicate values.
31+
After creating a table with primary key or unique constraint in SQL Analytics, users need to make sure all values in those columns are unique. A violation of that may cause the query to return inaccurate result. This example shows how a query may return inaccurate result if the primary key or unique constraint column includes duplicate values.
3132

3233
```sql
3334
-- Create table t1
@@ -153,17 +154,17 @@ a1 total
153154
```
154155

155156
## Examples
156-
Create a data warehouse table with a primary key:
157+
Create a SQL Analytics table with a primary key:
157158

158159
```sql
159160
CREATE TABLE mytable (c1 INT PRIMARY KEY NONCLUSTERED NOT ENFORCED, c2 INT);
160161
```
161-
Create a data warehouse table with a unique constraint:
162+
Create a SQL Analytics table with a unique constraint:
162163

163164
```sql
164165
CREATE TABLE t6 (c1 INT UNIQUE NOT ENFORCED, c2 INT);
165166
```
166167

167168
## Next steps
168169

169-
After creating the tables for your data warehouse, the next step is to load data into the table. For a loading tutorial, see [Loading data to SQL Data Warehouse](load-data-wideworldimportersdw.md).
170+
After creating the tables for your SQL Analytics database, the next step is to load data into the table. For a loading tutorial, see [Loading data to SQL Analytics databases](load-data-wideworldimportersdw.md).

articles/sql-data-warehouse/sql-data-warehouse-tables-distribute.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Distributed tables design guidance
3-
description: Recommendations for designing hash-distributed and round-robin distributed tables in Azure SQL Data Warehouse.
3+
description: Recommendations for designing hash-distributed and round-robin distributed tables in SQL Analytics.
44
services: sql-data-warehouse
55
author: XiaoyuMSFT
66
manager: craigg
@@ -11,12 +11,13 @@ ms.date: 04/17/2018
1111
ms.author: xiaoyul
1212
ms.reviewer: igorstan
1313
ms.custom: seo-lt-2019
14+
ms.custom: azure-synapse
1415
---
1516

16-
# Guidance for designing distributed tables in Azure SQL Data Warehouse
17-
Recommendations for designing hash-distributed and round-robin distributed tables in Azure SQL Data Warehouse.
17+
# Guidance for designing distributed tables in SQL Analytics
18+
Recommendations for designing hash-distributed and round-robin distributed tables in SQL Analytics.
1819

19-
This article assumes you are familiar with data distribution and data movement concepts in SQL Data Warehouse.  For more information, see [Azure SQL Data Warehouse - Massively Parallel Processing (MPP) architecture](massively-parallel-processing-mpp-architecture.md).
20+
This article assumes you are familiar with data distribution and data movement concepts in SQL Analytics.  For more information, see [SQL Analytics massively parallel processing (MPP) architecture](massively-parallel-processing-mpp-architecture.md).
2021

2122
## What is a distributed table?
2223
A distributed table appears as a single table, but the rows are actually stored across 60 distributions. The rows are distributed with a hash or round-robin algorithm.
@@ -29,15 +30,15 @@ As part of table design, understand as much as possible about your data and how
2930

3031
- How large is the table?  
3132
- How often is the table refreshed?  
32-
- Do I have fact and dimension tables in a data warehouse?  
33+
- Do I have fact and dimension tables in a SQL Analytics database?  
3334

3435

3536
### Hash distributed
3637
A hash-distributed table distributes table rows across the Compute nodes by using a deterministic hash function to assign each row to one [distribution](massively-parallel-processing-mpp-architecture.md#distributions).
3738

3839
![Distributed table](media/sql-data-warehouse-distributed-data/hash-distributed-table.png "Distributed table")
3940

40-
Since identical values always hash to the same distribution, the data warehouse has built-in knowledge of the row locations. SQL Data Warehouse uses this knowledge to minimize data movement during queries, which improves query performance.
41+
Since identical values always hash to the same distribution, the SQL Analytics has built-in knowledge of the row locations. SQL Analytics uses this knowledge to minimize data movement during queries, which improves query performance.
4142

4243
Hash-distributed tables work well for large fact tables in a star schema. They can have very large numbers of rows and still achieve high performance. There are, of course, some design considerations that help you to get the performance the distributed system is designed to provide. Choosing a good distribution column is one such consideration that is described in this article.
4344

@@ -60,7 +61,7 @@ Consider using the round-robin distribution for your table in the following scen
6061
- If the join is less significant than other joins in the query
6162
- When the table is a temporary staging table
6263

63-
The tutorial [Load New York taxicab data to Azure SQL Data Warehouse](load-data-from-azure-blob-storage-using-polybase.md#load-the-data-into-your-data-warehouse) gives an example of loading data into a round-robin staging table.
64+
The tutorial [Load New York taxicab data](load-data-from-azure-blob-storage-using-polybase.md#load-the-data-into-your-data-warehouse) gives an example of loading data into a round-robin staging table in SQL Analytics.
6465

6566

6667
## Choosing a distribution column
@@ -104,7 +105,7 @@ To balance the parallel processing, select a distribution column that:
104105

105106
### Choose a distribution column that minimizes data movement
106107

107-
To get the correct query result queries might move data from one Compute node to another. Data movement commonly happens when queries have joins and aggregations on distributed tables. Choosing a distribution column that helps minimize data movement is one of the most important strategies for optimizing performance of your SQL Data Warehouse.
108+
To get the correct query result queries might move data from one Compute node to another. Data movement commonly happens when queries have joins and aggregations on distributed tables. Choosing a distribution column that helps minimize data movement is one of the most important strategies for optimizing performance of your SQL Analytics database.
108109

109110
To minimize data movement, select a distribution column that:
110111

@@ -212,7 +213,7 @@ RENAME OBJECT [dbo].[FactInternetSales_CustomerKey] TO [FactInternetSales];
212213

213214
To create a distributed table, use one of these statements:
214215

215-
- [CREATE TABLE (Azure SQL Data Warehouse)](https://docs.microsoft.com/sql/t-sql/statements/create-table-azure-sql-data-warehouse)
216-
- [CREATE TABLE AS SELECT (Azure SQL Data Warehouse](https://docs.microsoft.com/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse)
216+
- [CREATE TABLE (SQL Analytics)](https://docs.microsoft.com/sql/t-sql/statements/create-table-azure-sql-data-warehouse)
217+
- [CREATE TABLE AS SELECT (SQL Analytics)](https://docs.microsoft.com/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse)
217218

218219

articles/sql-data-warehouse/sql-data-warehouse-tables-identity.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Using IDENTITY to create surrogate keys
3-
description: Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in Azure SQL Data Warehouse.
3+
description: Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in SQL Analytics.
44
services: sql-data-warehouse
55
author: XiaoyuMSFT
66
manager: craigg
@@ -11,19 +11,20 @@ ms.date: 04/30/2019
1111
ms.author: xiaoyul
1212
ms.reviewer: igorstan
1313
ms.custom: seo-lt-2019
14+
ms.custom: azure-synapse
1415
---
1516

16-
# Using IDENTITY to create surrogate keys in Azure SQL Data Warehouse
17+
# Using IDENTITY to create surrogate keys in SQL Analytics
1718

18-
Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in Azure SQL Data Warehouse.
19+
Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in SQL Analytics.
1920

2021
## What is a surrogate key
2122

22-
A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.
23+
A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design SQL Analytics models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.
2324

2425
## Creating a table with an IDENTITY column
2526

26-
The IDENTITY property is designed to scale out across all the distributions in the data warehouse without affecting load performance. Therefore, the implementation of IDENTITY is oriented toward achieving these goals.
27+
The IDENTITY property is designed to scale out across all the distributions in the SQL Analytics database without affecting load performance. Therefore, the implementation of IDENTITY is oriented toward achieving these goals.
2728

2829
You can define a table as having the IDENTITY property when you first create the table by using syntax that is similar to the following statement:
2930

@@ -45,7 +46,7 @@ This remainder of this section highlights the nuances of the implementation to h
4546

4647
### Allocation of values
4748

48-
The IDENTITY property doesn't guarantee the order in which the surrogate values are allocated, which reflects the behavior of SQL Server and Azure SQL Database. However, in Azure SQL Data Warehouse, the absence of a guarantee is more pronounced.
49+
The IDENTITY property doesn't guarantee the order in which the surrogate values are allocated, which reflects the behavior of SQL Server and Azure SQL Database. However, in SQL Analytics, the absence of a guarantee is more pronounced.
4950

5051
The following example is an illustration:
5152

@@ -95,7 +96,7 @@ CREATE TABLE AS SELECT (CTAS) follows the same SQL Server behavior that's docume
9596

9697
## Explicitly inserting values into an IDENTITY column
9798

98-
SQL Data Warehouse supports `SET IDENTITY_INSERT <your table> ON|OFF` syntax. You can use this syntax to explicitly insert values into the IDENTITY column.
99+
SQL Analytics supports `SET IDENTITY_INSERT <your table> ON|OFF` syntax. You can use this syntax to explicitly insert values into the IDENTITY column.
99100

100101
Many data modelers like to use predefined negative values for certain rows in their dimensions. An example is the -1 or "unknown member" row.
101102

@@ -156,7 +157,7 @@ DBCC PDW_SHOWSPACEUSED('dbo.T1');
156157
> It's not possible to use `CREATE TABLE AS SELECT` currently when loading data into a table with an IDENTITY column.
157158
>
158159
159-
For more information on loading data, see [Designing Extract, Load, and Transform (ELT) for Azure SQL Data Warehouse](design-elt-data-loading.md) and [Loading best practices](guidance-for-loading-data.md).
160+
For more information on loading data, see [Designing Extract, Load, and Transform (ELT) for SQL Analytics](design-elt-data-loading.md) and [Loading best practices](guidance-for-loading-data.md).
160161

161162
## System views
162163

@@ -190,7 +191,7 @@ The IDENTITY property can't be used:
190191
- When the column is also the distribution key
191192
- When the table is an external table
192193

193-
The following related functions are not supported in SQL Data Warehouse:
194+
The following related functions are not supported in SQL Analytics:
194195

195196
- [IDENTITY()](/sql/t-sql/functions/identity-function-transact-sql)
196197
- [@@IDENTITY](/sql/t-sql/functions/identity-transact-sql)

0 commit comments

Comments
 (0)