You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is.md
+6-10Lines changed: 6 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: What is dedicated SQL pool (formerly SQL DW)?
3
3
description: Dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics is the enterprise data warehousing functionality in Azure Synapse Analytics.
4
4
author: WilliamDAssafMSFT
5
5
ms.author: wiassaf
6
-
ms.date: 02/21/2023
6
+
ms.date: 07/19/2024
7
7
ms.service: synapse-analytics
8
8
ms.subservice: sql-dw
9
9
ms.topic: overview
@@ -13,11 +13,7 @@ ms.topic: overview
13
13
14
14
Azure Synapse Analytics is an analytics service that brings together enterprise data warehousing and Big Data analytics. Dedicated SQL pool (formerly SQL DW) refers to the enterprise data warehousing features that are available in Azure Synapse Analytics.
15
15
16
-
17
-
18
-

19
-
20
-
16
+
:::image type="content" source="media/sql-data-warehouse-overview-what-is/dedicated-sql-pool.png" alt-text="Diagram of dedicated SQL pool (formerly SQL DW) in relation to Azure Synapse." lightbox="media/sql-data-warehouse-overview-what-is/dedicated-sql-pool.png":::
21
17
22
18
Dedicated SQL pool (formerly SQL DW) represents a collection of analytic resources that are provisioned when using Synapse SQL. The size of a dedicated SQL pool (formerly SQL DW) is determined by Data Warehousing Units (DWU).
23
19
@@ -30,15 +26,15 @@ Once your dedicated SQL pool is created, you can import big data with simple [Po
30
26
31
27
Data warehousing is a key component of a cloud-based, end-to-end big data solution.
:::image type="content" source="media/sql-data-warehouse-overview-what-is/data-warehouse-solution.png" alt-text="Diagram of data warehouse solutions featuring zones for Ingest, Store, Prep & Train, and Model & Serve." lightbox="media/sql-data-warehouse-overview-what-is/data-warehouse-solution.png":::
34
30
35
31
In a cloud data solution, data is ingested into big data stores from a variety of sources. Once in a big data store, Hadoop, Spark, and machine learning algorithms prepare and train the data. When the data is ready for complex analysis, dedicated SQL pool uses PolyBase to query the big data stores. PolyBase uses standard T-SQL queries to bring the data into dedicated SQL pool (formerly SQL DW) tables.
36
32
37
33
Dedicated SQL pool (formerly SQL DW) stores data in relational tables with columnar storage. This format significantly reduces the data storage costs, and improves query performance. Once data is stored, you can run analytics at massive scale. Compared to traditional database systems, analysis queries finish in seconds instead of minutes, or hours instead of days.
38
34
39
35
The analysis results can go to worldwide reporting databases or applications. Business analysts can then gain insights to make well-informed business decisions.
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,15 @@
1
1
---
2
2
title: Distributed tables design guidance
3
3
description: Recommendations for designing hash-distributed and round-robin distributed tables using dedicated SQL pool.
4
-
ms.service: synapse-analytics
5
-
ms.topic: conceptual
6
-
ms.subservice: sql-dw
7
-
ms.date: 03/20/2023
8
4
author: WilliamDAssafMSFT
9
5
ms.author: wiassaf
10
6
ms.reviewer: mariyaali
11
-
ms.custom: azure-synapse
7
+
ms.date: 07/19/2024
8
+
ms.service: synapse-analytics
9
+
ms.subservice: sql-dw
10
+
ms.topic: conceptual
11
+
ms.custom:
12
+
- azure-synapse
12
13
---
13
14
14
15
# Guidance for designing distributed tables using dedicated SQL pool in Azure Synapse Analytics
@@ -35,7 +36,7 @@ As part of table design, understand as much as possible about your data and how
35
36
36
37
A hash-distributed table distributes table rows across the Compute nodes by using a deterministic hash function to assign each row to one [distribution](massively-parallel-processing-mpp-architecture.md#distributions).
:::image type="content" source="media/sql-data-warehouse-tables-distribute/hash-distributed-table.png" alt-text="Diagram of a distributedtable.":::
39
40
40
41
Since identical values always hash to the same distribution, SQL Analytics has built-in knowledge of the row locations. In dedicated SQL pool this knowledge is used to minimize data movement during queries, which improves query performance.
41
42
@@ -121,7 +122,7 @@ For best performance, all of the distributions should have approximately the sam
121
122
122
123
To balance the parallel processing, select a distribution column or set of columns that:
123
124
124
-
-**Has many unique values.** The distribution column(s) can have duplicate values. All rows with the same value are assigned to the same distribution. Since there are 60 distributions, some distributions can have > 1 unique values while others may end with zero values.
125
+
-**Has many unique values.** The distribution column(s) can have duplicate values. All rows with the same value are assigned to the same distribution. Since there are 60 distributions, some distributions can have > 1 unique values while others can end with zero values.
125
126
-**Does not have NULLs, or has only a few NULLs.** For an extreme example, if all values in the distribution column(s) are NULL, all the rows are assigned to the same distribution. As a result, query processing is skewed to one distribution, and does not benefit from parallel processing.
126
127
-**Is not a date column**. All data for the same date lands in the same distribution, or will cluster records by date. If several users are all filtering on the same date (such as today's date), then only 1 of the 60 distributions do all the processing work.
0 commit comments