You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Learn how Azure Synapse Analytics (formerly SQL DW) combines massively parallel processing (MPP) with Azure storage to achieve high performance and scalability.
3
+
description: Learn how Azure Synapse Analytics (formerly SQL DW) combines massively parallel processing (MPP) with Azure Storage to achieve high performance and scalability.
4
4
services: sql-data-warehouse
5
5
author: mlee3gsd
6
6
manager: craigg
@@ -17,22 +17,22 @@ ms.reviewer: igorstan
17
17
Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.
18
18
19
19
Azure Synapse has four components:
20
-
- SQL Analytics: Complete T-SQL based analytics
20
+
- SQL Analytics: Complete T-SQL based analytics
21
21
- SQL pool (pay per DWU provisioned) – Generally Available
22
22
- SQL on-demand (pay per TB processed) – (Preview)
23
-
- Spark: Deeply integrated Apache Spark (Preview)
24
-
- Data Integration: Hybrid data integration (Preview)
25
-
- Studio: unified user experience. (Preview)
23
+
- Spark: Deeply integrated Apache Spark (Preview)
24
+
- Data Integration: Hybrid data integration (Preview)
[SQL Analytics](sql-data-warehouse-overview-what-is.md#sql-analytics-and-sql-pool-in-azure-synapse) leverages a scaleout architecture to distribute computational processing of data across multiple nodes. The unit of scale is an abstraction of compute power that is known as a [data warehouse unit](what-is-a-data-warehouse-unit-dwu-cdwu.md). Compute is separate from storage which enables you to scale compute independently of the data in your system.
31
+
[SQL Analytics](sql-data-warehouse-overview-what-is.md#sql-analytics-and-sql-pool-in-azure-synapse) leverages a scale-out architecture to distribute computational processing of data across multiple nodes. The unit of scale is an abstraction of compute power that is known as a [data warehouse unit](what-is-a-data-warehouse-unit-dwu-cdwu.md). Compute is separate from storage, which enables you to scale compute independently of the data in your system.
SQL Analytics uses a node-based architecture. Applications connect and issue T-SQL commands to a Control node, which is the single point of entry for SQL Analytics. The Control node runs the MPP engine which optimizes queries for parallel processing, and then passes operations to Compute nodes to do their work in parallel.
35
+
SQL Analytics uses a node-based architecture. Applications connect and issue T-SQL commands to a Control node, which is the single point of entry for SQL Analytics. The Control node runs the MPP engine, which optimizes queries for parallel processing, and then passes operations to Compute nodes to do their work in parallel.
36
36
37
37
The Compute nodes store all user data in Azure Storage and run the parallel queries. The Data Movement Service (DMS) is a system-level internal service that moves data across the nodes as necessary to run queries in parallel and return accurate results.
38
38
@@ -43,9 +43,9 @@ With decoupled storage and compute, when using SQL Analytics one can:
43
43
* Pause compute capacity while leaving data intact, so you only pay for storage.
44
44
* Resume compute capacity during operational hours.
45
45
46
-
### Azure storage
46
+
### Azure Storage
47
47
48
-
SQL Analytics leverages Azure storage to keep your user data safe. Since your data is stored and managed by Azure storage, there is a separate charge for your storage consumption. The data itself is sharded into **distributions** to optimize the performance of the system. You can choose which sharding pattern to use to distribute the data when you define the table. These sharding patterns are supported:
48
+
SQL Analytics leverages Azure Storage to keep your user data safe. Since your data is stored and managed by Azure Storage, there is a separate charge for your storage consumption. The data is sharded into **distributions** to optimize the performance of the system. You can choose which sharding pattern to use to distribute the data when you define the table. These sharding patterns are supported:
49
49
50
50
* Hash
51
51
* Round Robin
@@ -88,55 +88,27 @@ There are performance considerations for the selection of a distribution column,
88
88
## Round-robin distributed tables
89
89
A round-robin table is the simplest table to create and delivers fast performance when used as a staging table for loads.
90
90
91
-
A round-robin distributed table distributes data evenly across the table but without any further optimization. A distribution is first chosen at random and then buffers of rows are assigned to distributions sequentially. It is quick to load data into a round-robin table, but query performance can often be better with hash distributed tables. Joins on round-robin tables require reshuffling data and this takes additional time.
91
+
A round-robin distributed table distributes data evenly across the table but without any further optimization. A distribution is first chosen at random and then buffers of rows are assigned to distributions sequentially. It is quick to load data into a round-robin table, but query performance can often be better with hash distributed tables. Joins on round-robin tables require reshuffling data, which takes additional time.
92
92
93
93
94
94
## Replicated Tables
95
95
A replicated table provides the fastest query performance for small tables.
96
96
97
-
A table that is replicated caches a full copy of the table on each compute node. Consequently, replicating a table removes the need to transfer data among compute nodes before a join or aggregation. Replicated tables are best utilized with small tables. Extra storage is required and there is additional overhead that is incurred when writing data which make large tables impractical.
97
+
A table that is replicated caches a full copy of the table on each compute node. Consequently, replicating a table removes the need to transfer data among compute nodes before a join or aggregation. Replicated tables are best utilized with small tables. Extra storage is required and there is additional overhead that is incurred when writing data, which make large tables impractical.
98
98
99
-
The diagram below shows a replicated table which is cached on the first distribution on each compute node.
99
+
The diagram below shows a replicated table that is cached on the first distribution on each compute node.
Now that you know a bit about Azure Synapse, learn how to quickly [create a SQL pool][create a SQL pool] and [load sample data][load sample data]. If you are new to Azure, you may find the [Azure glossary][Azure glossary] helpful as you encounter new terminology. Or look at some of these other Azure Synapse Resources.
Now that you know a bit about Azure Synapse, learn how to quickly [create a SQL pool](./sql-data-warehouse-get-started-provision.md) and [load sample data](./sql-data-warehouse-load-sample-databases.md). If you are new to Azure, you may find the [Azure glossary](../azure-glossary-cloud-terminology.md) helpful as you encounter new terminology. Or look at some of these other Azure Synapse Resources.
0 commit comments