Skip to content

Commit ffd0c68

Browse files
authored
Merge pull request #197340 from lucaferrari77/patch-1
Update sql-data-warehouse-develop-ctas.md
2 parents 48f62c0 + b0c0d1a commit ffd0c68

File tree

1 file changed

+11
-116
lines changed

1 file changed

+11
-116
lines changed

articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-develop-ctas.md

Lines changed: 11 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
22
title: CREATE TABLE AS SELECT (CTAS)
3-
description: Explanation and examples of the CREATE TABLE AS SELECT (CTAS) statement in Synapse SQL for developing solutions.
3+
description: Explanation and examples of the CREATE TABLE AS SELECT (CTAS) statement in dedicated SQL pool (formerly SQL DW) for developing solutions.
44
author: joannapea
55
manager: craigg
66
ms.service: synapse-analytics
77
ms.topic: conceptual
88
ms.subservice: sql-dw
9-
ms.date: 03/26/2019
9+
ms.date: 06/09/2022
1010
ms.author: joanpo
1111
ms.reviewer: wiassaf
1212
ms.custom: seoapril2019, azure-synapse
1313
---
1414

1515
# CREATE TABLE AS SELECT (CTAS)
1616

17-
This article explains the CREATE TABLE AS SELECT (CTAS) T-SQL statement in Synapse SQL for developing solutions. The article also provides code examples.
17+
This article explains the CREATE TABLE AS SELECT (CTAS) T-SQL statement in dedicated SQL pool (formerly SQL DW) for developing solutions. The article also provides code examples.
1818

1919
## CREATE TABLE AS SELECT
2020

@@ -55,7 +55,7 @@ FROM [dbo].[FactInternetSales];
5555

5656
Perhaps one of the most common uses of CTAS is creating a copy of a table in order to change the DDL. Let's say you originally created your table as `ROUND_ROBIN`, and now want to change it to a table distributed on a column. CTAS is how you would change the distribution column. You can also use CTAS to change partitioning, indexing, or column types.
5757

58-
Let's say you created this table by using the default distribution type of `ROUND_ROBIN`, not specifying a distribution column in the `CREATE TABLE`.
58+
Let's say you created this table by specifying HEAP and using the default distribution type of `ROUND_ROBIN`.
5959

6060
```sql
6161
CREATE TABLE FactInternetSales
@@ -82,7 +82,12 @@ CREATE TABLE FactInternetSales
8282
TaxAmt money NOT NULL,
8383
Freight money NOT NULL,
8484
CarrierTrackingNumber nvarchar(25),
85-
CustomerPONumber nvarchar(25));
85+
CustomerPONumber nvarchar(25)
86+
)
87+
WITH(
88+
HEAP,
89+
DISTRIBUTION = ROUND_ROBIN
90+
);
8691
```
8792

8893
Now you want to create a new copy of this table, with a `Clustered Columnstore Index`, so you can take advantage of the performance of Clustered Columnstore tables. You also want to distribute this table on `ProductKey`, because you're anticipating joins on this column and want to avoid data movement during joins on `ProductKey`. Lastly, you also want to add partitioning on `OrderDateKey`, so you can quickly delete old data by dropping old partitions. Here is the CTAS statement, which copies your old table into a new table.
@@ -115,116 +120,6 @@ RENAME OBJECT FactInternetSales_new TO FactInternetSales;
115120
DROP TABLE FactInternetSales_old;
116121
```
117122

118-
## Use CTAS to work around unsupported features
119-
120-
You can also use CTAS to work around a number of the unsupported features listed below. This method can often prove helpful, because not only will your code be compliant, but it will often run faster on Synapse SQL. This performance is a result of its fully parallelized design. Scenarios include:
121-
122-
* ANSI JOINS on UPDATEs
123-
* ANSI JOINs on DELETEs
124-
* MERGE statement
125-
126-
> [!TIP]
127-
> Try to think "CTAS first." Solving a problem by using CTAS is generally a good approach, even if you're writing more data as a result.
128-
129-
## ANSI join replacement for update statements
130-
131-
You might find that you have a complex update. The update joins more than two tables together by using ANSI join syntax to perform the UPDATE or DELETE.
132-
133-
Imagine you had to update this table:
134-
135-
```sql
136-
CREATE TABLE [dbo].[AnnualCategorySales]
137-
( [EnglishProductCategoryName] NVARCHAR(50) NOT NULL
138-
, [CalendarYear] SMALLINT NOT NULL
139-
, [TotalSalesAmount] MONEY NOT NULL
140-
)
141-
WITH
142-
(
143-
DISTRIBUTION = ROUND_ROBIN
144-
);
145-
```
146-
147-
The original query might have looked something like this example:
148-
149-
```sql
150-
UPDATE acs
151-
SET [TotalSalesAmount] = [fis].[TotalSalesAmount]
152-
FROM [dbo].[AnnualCategorySales] AS acs
153-
JOIN (
154-
SELECT [EnglishProductCategoryName]
155-
, [CalendarYear]
156-
, SUM([SalesAmount]) AS [TotalSalesAmount]
157-
FROM [dbo].[FactInternetSales] AS s
158-
JOIN [dbo].[DimDate] AS d ON s.[OrderDateKey] = d.[DateKey]
159-
JOIN [dbo].[DimProduct] AS p ON s.[ProductKey] = p.[ProductKey]
160-
JOIN [dbo].[DimProductSubCategory] AS u ON p.[ProductSubcategoryKey] = u.[ProductSubcategoryKey]
161-
JOIN [dbo].[DimProductCategory] AS c ON u.[ProductCategoryKey] = c.[ProductCategoryKey]
162-
WHERE [CalendarYear] = 2004
163-
GROUP BY
164-
[EnglishProductCategoryName]
165-
, [CalendarYear]
166-
) AS fis
167-
ON [acs].[EnglishProductCategoryName] = [fis].[EnglishProductCategoryName]
168-
AND [acs].[CalendarYear] = [fis].[CalendarYear];
169-
```
170-
171-
Synapse SQL doesn't support ANSI joins in the `FROM` clause of an `UPDATE` statement, so you can't use the previous example without modifying it.
172-
173-
You can use a combination of a CTAS and an implicit join to replace the previous example:
174-
175-
```sql
176-
-- Create an interim table
177-
CREATE TABLE CTAS_acs
178-
WITH (DISTRIBUTION = ROUND_ROBIN)
179-
AS
180-
SELECT ISNULL(CAST([EnglishProductCategoryName] AS NVARCHAR(50)),0) AS [EnglishProductCategoryName]
181-
, ISNULL(CAST([CalendarYear] AS SMALLINT),0) AS [CalendarYear]
182-
, ISNULL(CAST(SUM([SalesAmount]) AS MONEY),0) AS [TotalSalesAmount]
183-
FROM [dbo].[FactInternetSales] AS s
184-
JOIN [dbo].[DimDate] AS d ON s.[OrderDateKey] = d.[DateKey]
185-
JOIN [dbo].[DimProduct] AS p ON s.[ProductKey] = p.[ProductKey]
186-
JOIN [dbo].[DimProductSubCategory] AS u ON p.[ProductSubcategoryKey] = u.[ProductSubcategoryKey]
187-
JOIN [dbo].[DimProductCategory] AS c ON u.[ProductCategoryKey] = c.[ProductCategoryKey]
188-
WHERE [CalendarYear] = 2004
189-
GROUP BY [EnglishProductCategoryName]
190-
, [CalendarYear];
191-
192-
-- Use an implicit join to perform the update
193-
UPDATE AnnualCategorySales
194-
SET AnnualCategorySales.TotalSalesAmount = CTAS_ACS.TotalSalesAmount
195-
FROM CTAS_acs
196-
WHERE CTAS_acs.[EnglishProductCategoryName] = AnnualCategorySales.[EnglishProductCategoryName]
197-
AND CTAS_acs.[CalendarYear] = AnnualCategorySales.[CalendarYear] ;
198-
199-
--Drop the interim table
200-
DROP TABLE CTAS_acs;
201-
```
202-
203-
## ANSI join replacement for MERGE
204-
205-
In Azure Synapse Analytics, [MERGE](/sql/t-sql/statements/merge-transact-sql?view=azure-sqldw-latest&preserve-view=true) (preview) with NOT MATCHED BY TARGET requires the target to be a HASH distributed table. Users can use the ANSI JOIN with [UPDATE](/sql/t-sql/queries/update-transact-sql?view=azure-sqldw-latest&preserve-view=true) or [DELETE](/sql/t-sql/statements/delete-transact-sql?view=azure-sqldw-latest&preserve-view=true) as a workaround to modify target table data based on the result from joining with another table. Here is an example.
206-
207-
```sql
208-
CREATE TABLE dbo.Table1
209-
(ColA INT NOT NULL, ColB DECIMAL(10,3) NOT NULL);
210-
GO
211-
CREATE TABLE dbo.Table2
212-
(ColA INT NOT NULL, ColB DECIMAL(10,3) NOT NULL);
213-
GO
214-
INSERT INTO dbo.Table1 VALUES(1, 10.0);
215-
INSERT INTO dbo.Table2 VALUES(1, 0.0);
216-
GO
217-
UPDATE dbo.Table2
218-
SET dbo.Table2.ColB = dbo.Table2.ColB + dbo.Table1.ColB
219-
FROM dbo.Table2
220-
INNER JOIN dbo.Table1
221-
ON (dbo.Table2.ColA = dbo.Table1.ColA);
222-
GO
223-
SELECT ColA, ColB
224-
FROM dbo.Table2;
225-
226-
```
227-
228123
## Explicitly state data type and nullability of output
229124

230125
When migrating code, you might find you run across this type of coding pattern:
@@ -377,4 +272,4 @@ CTAS is one of the most important statements in Synapse SQL. Make sure you thoro
377272

378273
## Next steps
379274

380-
For more development tips, see the [development overview](sql-data-warehouse-overview-develop.md).
275+
For more development tips, see the [development overview](sql-data-warehouse-overview-develop.md).

0 commit comments

Comments
 (0)