Skip to content

Commit aaa2bd8

Browse files
20240719 freshness pass
1 parent c8d9f28 commit aaa2bd8

File tree

6 files changed

+18
-22
lines changed

6 files changed

+18
-22
lines changed

articles/synapse-analytics/index.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ metadata:
1111

1212
author: WilliamDAssafMSFT
1313
ms.author: wiassaf
14-
ms.date: 08/09/2022
14+
ms.date: 07/19/2024
1515

1616
# linkListType: architecture | concept | deploy | download | get-started | how-to-guide | learn | overview | quickstart | reference | tutorial | video | whats-new
1717

-19.2 KB
Loading
-28.9 KB
Loading
-14.6 KB
Loading

articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: What is dedicated SQL pool (formerly SQL DW)?
33
description: Dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics is the enterprise data warehousing functionality in Azure Synapse Analytics.
44
author: WilliamDAssafMSFT
55
ms.author: wiassaf
6-
ms.date: 02/21/2023
6+
ms.date: 07/19/2024
77
ms.service: synapse-analytics
88
ms.subservice: sql-dw
99
ms.topic: overview
@@ -13,11 +13,7 @@ ms.topic: overview
1313

1414
Azure Synapse Analytics is an analytics service that brings together enterprise data warehousing and Big Data analytics. Dedicated SQL pool (formerly SQL DW) refers to the enterprise data warehousing features that are available in Azure Synapse Analytics.
1515

16-
17-
18-
![Dedicated SQL pool (formerly SQL DW) in relation to Azure Synapse](./media/sql-data-warehouse-overview-what-is/dedicated-sql-pool.png)
19-
20-
16+
:::image type="content" source="media/sql-data-warehouse-overview-what-is/dedicated-sql-pool.png" alt-text="Diagram of dedicated SQL pool (formerly SQL DW) in relation to Azure Synapse." lightbox="media/sql-data-warehouse-overview-what-is/dedicated-sql-pool.png":::
2117

2218
Dedicated SQL pool (formerly SQL DW) represents a collection of analytic resources that are provisioned when using Synapse SQL. The size of a dedicated SQL pool (formerly SQL DW) is determined by Data Warehousing Units (DWU).
2319

@@ -30,15 +26,15 @@ Once your dedicated SQL pool is created, you can import big data with simple [Po
3026

3127
Data warehousing is a key component of a cloud-based, end-to-end big data solution.
3228

33-
![Data warehouse solution](./media/sql-data-warehouse-overview-what-is/data-warehouse-solution.png)
29+
:::image type="content" source="media/sql-data-warehouse-overview-what-is/data-warehouse-solution.png" alt-text="Diagram of data warehouse solutions featuring zones for Ingest, Store, Prep & Train, and Model & Serve." lightbox="media/sql-data-warehouse-overview-what-is/data-warehouse-solution.png":::
3430

3531
In a cloud data solution, data is ingested into big data stores from a variety of sources. Once in a big data store, Hadoop, Spark, and machine learning algorithms prepare and train the data. When the data is ready for complex analysis, dedicated SQL pool uses PolyBase to query the big data stores. PolyBase uses standard T-SQL queries to bring the data into dedicated SQL pool (formerly SQL DW) tables.
3632

3733
Dedicated SQL pool (formerly SQL DW) stores data in relational tables with columnar storage. This format significantly reduces the data storage costs, and improves query performance. Once data is stored, you can run analytics at massive scale. Compared to traditional database systems, analysis queries finish in seconds instead of minutes, or hours instead of days.
3834

3935
The analysis results can go to worldwide reporting databases or applications. Business analysts can then gain insights to make well-informed business decisions.
4036

41-
## Next steps
37+
## Related content
4238

4339
- Explore [Azure Synapse architecture](massively-parallel-processing-mpp-architecture.md)
4440
- Quickly [create a dedicated SQL pool](../quickstart-create-sql-pool-studio.md)
@@ -52,5 +48,5 @@ Or look at some of these other Azure Synapse resources:
5248
- Search [Blogs](https://azure.microsoft.com/blog/tag/azure-sql-data-warehouse/)
5349
- Submit a [Feature requests](https://feedback.azure.com/d365community/forum/9b9ba8e4-0825-ec11-b6e6-000d3a4f07b8)
5450
- [Create a support ticket](sql-data-warehouse-get-started-create-support-ticket.md)
55-
- Search [Microsoft Q&A question page](/answers/topics/azure-synapse-analytics.html)
56-
- Search [Stack Overflow forum](https://stackoverflow.com/questions/tagged/azure-sqldw)
51+
- [Microsoft Q&A question page](/answers/topics/azure-synapse-analytics.html)
52+
- [Stack Overflow forum](https://stackoverflow.com/questions/tagged/azure-sqldw)

articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
---
22
title: Distributed tables design guidance
33
description: Recommendations for designing hash-distributed and round-robin distributed tables using dedicated SQL pool.
4-
ms.service: synapse-analytics
5-
ms.topic: conceptual
6-
ms.subservice: sql-dw
7-
ms.date: 03/20/2023
84
author: WilliamDAssafMSFT
95
ms.author: wiassaf
106
ms.reviewer: mariyaali
11-
ms.custom: azure-synapse
7+
ms.date: 07/19/2024
8+
ms.service: synapse-analytics
9+
ms.subservice: sql-dw
10+
ms.topic: conceptual
11+
ms.custom:
12+
- azure-synapse
1213
---
1314

1415
# Guidance for designing distributed tables using dedicated SQL pool in Azure Synapse Analytics
@@ -35,7 +36,7 @@ As part of table design, understand as much as possible about your data and how
3536

3637
A hash-distributed table distributes table rows across the Compute nodes by using a deterministic hash function to assign each row to one [distribution](massively-parallel-processing-mpp-architecture.md#distributions).
3738

38-
:::image type="content" source="./media/sql-data-warehouse-tables-distribute/hash-distributed-table.png" alt-text="Distributed table" lightbox="./media/sql-data-warehouse-tables-distribute/hash-distributed-table.png":::
39+
:::image type="content" source="media/sql-data-warehouse-tables-distribute/hash-distributed-table.png" alt-text="Diagram of a distributed table.":::
3940

4041
Since identical values always hash to the same distribution, SQL Analytics has built-in knowledge of the row locations. In dedicated SQL pool this knowledge is used to minimize data movement during queries, which improves query performance.
4142

@@ -121,7 +122,7 @@ For best performance, all of the distributions should have approximately the sam
121122

122123
To balance the parallel processing, select a distribution column or set of columns that:
123124

124-
- **Has many unique values.** The distribution column(s) can have duplicate values. All rows with the same value are assigned to the same distribution. Since there are 60 distributions, some distributions can have > 1 unique values while others may end with zero values.
125+
- **Has many unique values.** The distribution column(s) can have duplicate values. All rows with the same value are assigned to the same distribution. Since there are 60 distributions, some distributions can have > 1 unique values while others can end with zero values.
125126
- **Does not have NULLs, or has only a few NULLs.** For an extreme example, if all values in the distribution column(s) are NULL, all the rows are assigned to the same distribution. As a result, query processing is skewed to one distribution, and does not benefit from parallel processing.
126127
- **Is not a date column**. All data for the same date lands in the same distribution, or will cluster records by date. If several users are all filtering on the same date (such as today's date), then only 1 of the 60 distributions do all the processing work.
127128

@@ -153,7 +154,7 @@ DBCC PDW_SHOWSPACEUSED('dbo.FactInternetSales');
153154
To identify which tables have more than 10% data skew:
154155

155156
1. Create the view `dbo.vTableSizes` that is shown in the [Tables overview](sql-data-warehouse-tables-overview.md#table-size-queries) article.
156-
2. Run the following query:
157+
1. Run the following query:
157158

158159
```sql
159160
select *
@@ -178,7 +179,7 @@ To avoid data movement during a join:
178179
- The tables involved in the join must be hash distributed on **one** of the columns participating in the join.
179180
- The data types of the join columns must match between both tables.
180181
- The columns must be joined with an equals operator.
181-
- The join type may not be a `CROSS JOIN`.
182+
- The join type cannot be a `CROSS JOIN`.
182183

183184
To see if queries are experiencing data movement, you can look at the query plan.
184185

@@ -233,8 +234,7 @@ RENAME OBJECT [dbo].[FactInternetSales] TO [FactInternetSales_ProductKey];
233234
RENAME OBJECT [dbo].[FactInternetSales_CustomerKey] TO [FactInternetSales];
234235
```
235236

236-
## Next steps
237-
237+
## Related content
238238
To create a distributed table, use one of these statements:
239239

240240
- [CREATE TABLE (dedicated SQL pool)](/sql/t-sql/statements/create-table-azure-sql-data-warehouse?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true)

0 commit comments

Comments
 (0)