You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/get-started-analyze-sql-pool.md
+31-6Lines changed: 31 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,29 +4,38 @@ description: In this tutorial, use the NYC Taxi sample data to explore SQL pool'
4
4
author: whhender
5
5
ms.author: whhender
6
6
ms.reviewer: whhender, wiassaf
7
-
ms.date: 10/16/2023
7
+
ms.date: 12/11/2024
8
8
ms.service: azure-synapse-analytics
9
9
ms.subservice: sql
10
10
ms.topic: tutorial
11
11
ms.custom: engagement-fy23
12
12
---
13
13
14
-
# Analyze data with dedicated SQL pools
14
+
# Tutorial: Analyze data with dedicated SQL pools
15
15
16
16
In this tutorial, use the NYC Taxi data to explore a dedicated SQL pool's capabilities.
17
17
18
+
> [!div class="checklist"]
19
+
> *[Deploy a dedicated SQL pool]
20
+
> *[Load data into the pool]
21
+
> *[Explore the data you've loaded]
22
+
23
+
## Prerequisites
24
+
25
+
* This tutorial assumes you've completed the steps in the rest of the quickstarts. Specifically it uses the 'contosodatalake' resource created in [the Create a Synapse Workspace quickstart.](get-started-create-workspace.md#place-sample-data-into-the-primary-storage-account)
26
+
18
27
## Create a dedicated SQL pool
19
28
20
29
1. In Synapse Studio, on the left-side pane, select **Manage** > **SQL pools** under **Analytics pools**.
21
30
1. Select **New**.
22
31
1. For **Dedicated SQL pool name** select `SQLPOOL1`.
23
32
1. For **Performance level** choose **DW100C**.
24
-
1. Select **Review + create** > **Create**. Your dedicated SQL pool will be ready in a few minutes.
33
+
1. Select **Review + create** > **Create**. Your dedicated SQL pool will be ready in a few minutes.
25
34
26
35
Your dedicated SQL pool is associated with a SQL database that's also called `SQLPOOL1`.
27
36
28
37
1. Navigate to **Data** > **Workspace**.
29
-
1. You should see a database named **SQLPOOL1**. If you do not see it, select **Refresh**.
38
+
1. You should see a database named **SQLPOOL1**. If you don't see it, select **Refresh**.
30
39
31
40
A dedicated SQL pool consumes billable resources as long as it's active. You can pause the pool later to reduce costs.
32
41
@@ -83,13 +92,20 @@ A dedicated SQL pool consumes billable resources as long as it's active. You can
83
92
,IDENTITY_INSERT ='OFF'
84
93
)
85
94
```
95
+
96
+
>[!TIP]
97
+
>If you get an error that reads `Login failed for user '<token-identified principal>'`, you need to set your Entra Id admin.
98
+
>1. In the Azure Portal, search for your synapse workspace.
99
+
>1. Under **Settings**select**Microsoft Entra ID**.
100
+
>1. Select**Set admin**andset a Microsoft Entra ID admin.
101
+
86
102
1. Select the **Run** button to execute the script.
87
103
1. This script finishes in less than 60 seconds. It loads 2 million rows of NYC Taxi data into a table called `dbo.NYCTaxiTripSmall`.
88
104
89
105
## Explore the NYC Taxi data in the dedicated SQL pool
90
106
91
107
1. In Synapse Studio, go to the **Data** hub.
92
-
1. Go to **SQLPOOL1**>**Tables**.
108
+
1. Go to **SQLPOOL1**>**Tables**. (If you don't see it in the menu, refresh the page.)
93
109
1. Right-click the **dbo.NYCTaxiTripSmall** table and select **New SQL Script** > **Select TOP 100 Rows**.
94
110
1. Wait while a new SQL script is created and runs.
95
111
1. At the top of the SQL script **Connect to** is automatically set to the SQL pool called **SQLPOOL1**.
@@ -110,7 +126,16 @@ A dedicated SQL pool consumes billable resources as long as it's active. You can
110
126
111
127
This query creates a table `dbo.PassengerCountStats` with aggregate data from the `trip_distance` field, then queries the new table. The data shows how the total trip distances and average trip distance relate to the number of passengers.
112
128
1. In the SQL script result window, change the **View** to **Chart** to see a visualization of the results as a line chart. Change **Category column** to `PassengerCount`.
113
-
129
+
130
+
## Clean up
131
+
132
+
Pause your dedicated SQL Pool to reduce costs.
133
+
134
+
1. Navigate to **Manage** in your synapse workspace.
135
+
1. Select **SQL pools**.
136
+
1. Hover over SQLPOOL1 and select the **Pause** button.
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-pool-configurations.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
1
---
2
2
title: Apache Spark pool concepts
3
3
description: Introduction to Apache Spark pool sizes and configurations in Azure Synapse Analytics.
4
-
ms.topic: conceptual
4
+
ms.topic: concept-article
5
5
ms.service: azure-synapse-analytics
6
6
ms.subservice: spark
7
7
ms.custom: references_regions
8
8
author: guyhay
9
9
ms.author: guyhay
10
10
ms.reviewer: whhender
11
-
ms.date: 09/07/2022
11
+
ms.date: 12/06/2024
12
12
---
13
13
14
14
# Apache Spark pool configurations in Azure Synapse Analytics
@@ -53,7 +53,7 @@ Autoscale for Apache Spark pools allows automatic scale up and down of compute r
53
53
Apache Spark pools now support elastic pool storage. Elastic pool storage allows the Spark engine to monitor worker node temporary storage and attach extra disks if needed. Apache Spark pools utilize temporary disk storage while the pool is instantiated. Spark jobs write shuffle map outputs, shuffle data and spilled data to local VM disks. Examples of operations that could utilize local disk are sort, cache, and persist. When temporary VM disk space runs out, Spark jobs could fail due to “Out of Disk Space” error (java.io.IOException: No space left on device). With “Out of Disk Space” errors, much of the burden to prevent jobs from failing shifts to the customer to reconfigure the Spark jobs (for example, tweak the number of partitions) or clusters (for example, add more nodes to the cluster). These errors might not be consistent, and the user might end up experimenting heavily by running production jobs. This process can be expensive for the user in multiple dimensions:
54
54
55
55
* Wasted time. Customers are required to experiment heavily with job configurations via trial and error and are expected to understand Spark’s internal metrics to make the correct decision.
56
-
* Wasted resources. Since production jobs can process varying amount of data, Spark jobs can fail non-deterministically if resources aren't over-provisioned. For instance, consider the problem of data skew, which could result in a few nodes requiring more disk space than others. Currently in Synapse, each node in a cluster gets the same size of disk space and increasing disk space across all nodes isn't an ideal solution and leads to tremendous waste.
56
+
* Wasted resources. Since production jobs can process varying amount of data, Spark jobs can fail nondeterministically if resources aren't over-provisioned. For instance, consider the problem of data skew, which could result in a few nodes requiring more disk space than others. Currently in Synapse, each node in a cluster gets the same size of disk space and increasing disk space across all nodes isn't an ideal solution and leads to tremendous waste.
57
57
* Slowdown in job execution. In the hypothetical scenario where we solve the problem by autoscaling nodes (assuming costs aren't an issue to the end customer), adding a compute node is still expensive (takes a few minutes) as opposed to adding storage (takes a few seconds).
58
58
59
59
No action is required by you, plus you should see fewer job failures as a result.
@@ -65,7 +65,7 @@ No action is required by you, plus you should see fewer job failures as a result
65
65
66
66
The automatic pause feature releases resources after a set idle period, reducing the overall cost of an Apache Spark pool. The number of minutes of idle time can be set once this feature is enabled. The automatic pause feature is independent of the autoscale feature. Resources can be paused whether the autoscale is enabled or disabled. This setting can be altered after pool creation although active sessions will need to be restarted.
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/create-use-external-tables.md
+14-15Lines changed: 14 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,9 @@ title: Create and use external tables in Synapse SQL pool
3
3
description: In this section, you'll learn how to create and use external tables in Synapse SQL pool.
4
4
author: vvasic-msft
5
5
ms.service: azure-synapse-analytics
6
-
ms.topic: overview
6
+
ms.topic: how-to
7
7
ms.subservice: sql
8
-
ms.date: 02/02/2022
8
+
ms.date: 12/11/2024
9
9
ms.author: vvasic
10
10
ms.reviewer: whhender, wiassaf
11
11
---
@@ -78,14 +78,12 @@ The queries in this article will be executed on your sample database and use the
78
78
79
79
## External table on a file
80
80
81
-
You can create external tables that access data on an Azure storage account that allows access to users with some Microsoft Entra identity or SAS key. You can create external tables the same way you create regular SQL Server external tables.
81
+
You can create external tables that access data on an Azure storage account that allows access to users with some Microsoft Entra identity or SAS key. You can create external tables the same way you create regular SQL Server external tables.
82
82
83
-
The following query creates an external table that reads *population.csv* file from SynapseSQL demo Azure storage account that is referenced using `sqlondemanddemo` data source and protected with database scoped credential called `sqlondemand`.
84
-
85
-
Data source and database scoped credential are created in [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql).
83
+
The following query creates an external table that reads *population.csv* file from SynapseSQL demo Azure storage account that is referenced using `sqlondemanddemo` data source and protected with database scoped credential called `sqlondemand`.
86
84
87
85
> [!NOTE]
88
-
> Change the first linein the query, i.e., [mydbname], so you're using the database you created.
86
+
> Change the first linein the query, i.e., [mydbname], so you're using the database you created.
You can specify the pattern that the files must satisfy in order to be referenced by the external table. The pattern is required only for Parquet and CSV tables. If you are using Delta Lake format, you need to specify just a root folder, and the external table will automatically find the pattern.
129
+
You can specify the pattern that the files must satisfy in order to be referenced by the external table. The pattern is required only for Parquet and CSV tables. If you're using Delta Lake format, you need to specify just a root folder, and the external table will automatically find the pattern.
132
130
133
131
> [!NOTE]
134
132
> The table is created on partitioned folder structure, but you cannot leverage some partition elimination. If you want to get better performance by skipping the files that do not satisfy some criterion (like specific year or month in this case), use [views on external data](create-use-views.md#partitioned-views).
135
133
136
134
## External table on appendable files
137
135
138
-
The files that are referenced by an external table should not be changed while the query is running. In the long-running query, SQL pool may retry reads, read parts of the files, or even read the file multiple times. Changes of the file content would cause wrong results. Therefore, the SQL pool fails the query if detects that the modification time of any file is changed during the query execution.
139
-
In some scenarios you might want to create a table on the files that are constantly appended. To avoid the query failures due to constantly appended files, you can specify that the external table should ignore potentially inconsistent reads using the `TABLE_OPTIONS` setting.
136
+
The files that are referenced by an external table shouldn't be changed while the query is running. In the long-running query, SQL pool could retry reads, read parts of the files, or even read the file multiple times. Changes of the file content would cause wrong results. Therefore, the SQL pool fails the query if detects that the modification time of any file is changed during the query execution.
137
+
In some scenarios, you might want to create a table on the files that are constantly appended. To avoid the query failures due to constantly appended files, you can specify that the external table should ignore potentially inconsistent reads using the `TABLE_OPTIONS` setting.
140
138
141
139
142
140
```sql
@@ -155,7 +153,7 @@ WITH (
155
153
);
156
154
```
157
155
158
-
The `ALLOW_INCONSISTENT_READS` read option will disable file modification time check during the query lifecycle and read whatever is available in the files that are referenced by the external table. In appendable files, the existing content is not updated, and only new rows are added. Therefore, the probability of wrong results is minimized compared to the updateable files. This option might enable you to read the frequently appended files without handling the errors.
156
+
The `ALLOW_INCONSISTENT_READS` read option will disable file modification time check during the query lifecycle and read whatever is available in the files that are referenced by the external table. In appendable files, the existing content isn't updated, and only new rows are added. Therefore, the probability of wrong results is minimized compared to the updateable files. This option might enable you to read the frequently appended files without handling the errors.
159
157
160
158
This option is available only in the external tables created on CSV file format.
External tables cannot be created on a partitioned folder. Review the other known issues on [Synapse serverless SQL pool self-help page](resources-self-help-sql-on-demand.md#delta-lake).
184
+
External tables can't be created on a partitioned folder. Review the other known issues on [Synapse serverless SQL pool self-help page](resources-self-help-sql-on-demand.md#delta-lake).
187
185
188
186
### Delta tables on partitioned folders
189
187
190
-
External tables in serverless SQL pools do not support partitioning on Delta Lake format. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) instead of tables if you have partitioned Delta Lake data sets.
188
+
External tables in serverless SQL pools don't support partitioning on Delta Lake format. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) instead of tables if you have partitioned Delta Lake data sets.
191
189
192
190
> [!IMPORTANT]
193
191
> Do not create external tables on partitioned Delta Lake folders even if you see that they might work in some cases. Using unsupported features like external tables on partitioned delta folders might cause issues or instability of the serverless pool. Azure support will not be able to resolve any issue if it is using tables on partitioned folders. You would be asked to transition to [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) and rewrite your code to use only the supported feature before proceeding with issue resolution.
@@ -216,6 +214,7 @@ ORDER BY
216
214
217
215
Performance of this query might vary depending on region. Your workspace might not be placed in the same region as the Azure storage accounts used in these samples. For production workloads, place your Synapse workspace and Azure storage in the same region.
218
216
219
-
## Next steps
217
+
## Next step
220
218
221
-
For information on how to store results of a query to storage, refer to [Store query results to the storage](../sql/create-external-table-as-select.md) article.
219
+
> [!div class="nextstepaction"]
220
+
> [Store query results to the storage](../sql/create-external-table-as-select.md)
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/create-use-views.md
+6-12Lines changed: 6 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,9 @@ title: Create and use views in serverless SQL pool
3
3
description: In this section, you'll learn how to create and use views to wrap serverless SQL pool queries. Views will allow you to reuse those queries. Views are also needed if you want to use tools, such as Power BI, in conjunction with serverless SQL pool.
4
4
author: azaricstefan
5
5
ms.service: azure-synapse-analytics
6
-
ms.topic: overview
6
+
ms.topic: how-to
7
7
ms.subservice: sql
8
-
ms.date: 05/20/2020
8
+
ms.date: 12/06/2024
9
9
ms.author: stefanazaric
10
10
ms.reviewer: whhender, wiassaf
11
11
---
@@ -53,7 +53,7 @@ The view uses an `EXTERNAL DATA SOURCE` with a root URL of your storage, as a `D
53
53
54
54
### Delta Lake views
55
55
56
-
If you are creating the views on top of Delta Lake folder, you need to specify the location to the root folder after the `BULK` option instead of specifying the file path.
56
+
If you're creating the views on top of Delta Lake folder, you need to specify the location to the root folder after the `BULK` option instead of specifying the file path.
57
57
58
58
> [!div class="mx-imgBorder"]
59
59
>
@@ -100,7 +100,7 @@ When using JOINs in SQL queries, declare the filter predicate as NVARCHAR to red
100
100
101
101
### Delta Lake partitioned views
102
102
103
-
If you are creating the partitioned views on top of Delta Lake storage, you can specify just a root Delta Lake folder and don't need to explicitly expose the partitioning columns using the `FILEPATH` function:
103
+
If you're creating the partitioned views on top of Delta Lake storage, you can specify just a root Delta Lake folder and don't need to explicitly expose the partitioning columns using the `FILEPATH` function:
104
104
105
105
```sql
106
106
CREATE OR ALTER VIEW YellowTaxiView
@@ -124,7 +124,7 @@ For more information, review [Synapse serverless SQL pool self-help page](resour
124
124
125
125
## JSON views
126
126
127
-
The views are the good choice if you need to do some additional processing on top of the result set that is fetched from the files. One example might be parsing JSON files where we need to apply the JSON functions to extract the values from the JSON documents:
127
+
The views are the good choice if you need to do some extra processing on top of the result set that is fetched from the files. One example might be parsing JSON files where we need to apply the JSON functions to extract the values from the JSON documents:
128
128
129
129
```sql
130
130
CREATE OR ALTER VIEW CovidCases
@@ -191,12 +191,6 @@ ORDER BY
191
191
192
192
When you query the view, you may encounter errors or unexpected results. This probably means that the view references columns or objects that were modified or no longer exist. You need to manually adjust the view definition to align with the underlying schema changes.
193
193
194
-
## Next steps
194
+
## Related content
195
195
196
196
For information on how to query different file types, refer to the [Query single CSV file](query-single-csv-file.md), [Query Parquet files](query-parquet-files.md), and [Query JSON files](query-json-files.md) articles.
197
-
198
-
-[What's new in Azure Synapse Analytics?](../whats-new.md).
199
-
-[Best practices for serverless SQL pool in Azure Synapse Analytics](best-practices-serverless-sql-pool.md)
200
-
-[Troubleshoot serverless SQL pool in Azure Synapse Analytics](resources-self-help-sql-on-demand.md)
201
-
-[Troubleshoot a slow query on a dedicated SQL Pool](/troubleshoot/azure/synapse-analytics/dedicated-sql/troubleshoot-dsql-perf-slow-query)
202
-
-[Synapse Studio troubleshooting](../troubleshoot/troubleshoot-synapse-studio.md)
0 commit comments