Skip to content

Commit 1c15942

Browse files
Merge pull request #291858 from whhender/december-synapse-freshness
December synapse freshness part 1
2 parents f7c81de + 4b1e695 commit 1c15942

File tree

5 files changed

+61
-42
lines changed

5 files changed

+61
-42
lines changed

articles/synapse-analytics/get-started-analyze-sql-pool.md

Lines changed: 31 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,38 @@ description: In this tutorial, use the NYC Taxi sample data to explore SQL pool'
44
author: whhender
55
ms.author: whhender
66
ms.reviewer: whhender, wiassaf
7-
ms.date: 10/16/2023
7+
ms.date: 12/11/2024
88
ms.service: azure-synapse-analytics
99
ms.subservice: sql
1010
ms.topic: tutorial
1111
ms.custom: engagement-fy23
1212
---
1313

14-
# Analyze data with dedicated SQL pools
14+
# Tutorial: Analyze data with dedicated SQL pools
1515

1616
In this tutorial, use the NYC Taxi data to explore a dedicated SQL pool's capabilities.
1717

18+
> [!div class="checklist"]
19+
> * [Deploy a dedicated SQL pool]
20+
> * [Load data into the pool]
21+
> * [Explore the data you've loaded]
22+
23+
## Prerequisites
24+
25+
* This tutorial assumes you've completed the steps in the rest of the quickstarts. Specifically it uses the 'contosodatalake' resource created in [the Create a Synapse Workspace quickstart.](get-started-create-workspace.md#place-sample-data-into-the-primary-storage-account)
26+
1827
## Create a dedicated SQL pool
1928

2029
1. In Synapse Studio, on the left-side pane, select **Manage** > **SQL pools** under **Analytics pools**.
2130
1. Select **New**.
2231
1. For **Dedicated SQL pool name** select `SQLPOOL1`.
2332
1. For **Performance level** choose **DW100C**.
24-
1. Select **Review + create** > **Create**. Your dedicated SQL pool will be ready in a few minutes.
33+
1. Select **Review + create** > **Create**. Your dedicated SQL pool will be ready in a few minutes.
2534

2635
Your dedicated SQL pool is associated with a SQL database that's also called `SQLPOOL1`.
2736

2837
1. Navigate to **Data** > **Workspace**.
29-
1. You should see a database named **SQLPOOL1**. If you do not see it, select **Refresh**.
38+
1. You should see a database named **SQLPOOL1**. If you don't see it, select **Refresh**.
3039

3140
A dedicated SQL pool consumes billable resources as long as it's active. You can pause the pool later to reduce costs.
3241

@@ -83,13 +92,20 @@ A dedicated SQL pool consumes billable resources as long as it's active. You can
8392
,IDENTITY_INSERT = 'OFF'
8493
)
8594
```
95+
96+
>[!TIP]
97+
>If you get an error that reads `Login failed for user '<token-identified principal>'`, you need to set your Entra Id admin.
98+
> 1. In the Azure Portal, search for your synapse workspace.
99+
> 1. Under **Settings** select **Microsoft Entra ID**.
100+
> 1. Select **Set admin** and set a Microsoft Entra ID admin.
101+
86102
1. Select the **Run** button to execute the script.
87103
1. This script finishes in less than 60 seconds. It loads 2 million rows of NYC Taxi data into a table called `dbo.NYCTaxiTripSmall`.
88104

89105
## Explore the NYC Taxi data in the dedicated SQL pool
90106

91107
1. In Synapse Studio, go to the **Data** hub.
92-
1. Go to **SQLPOOL1** > **Tables**.
108+
1. Go to **SQLPOOL1** > **Tables**. (If you don't see it in the menu, refresh the page.)
93109
1. Right-click the **dbo.NYCTaxiTripSmall** table and select **New SQL Script** > **Select TOP 100 Rows**.
94110
1. Wait while a new SQL script is created and runs.
95111
1. At the top of the SQL script **Connect to** is automatically set to the SQL pool called **SQLPOOL1**.
@@ -110,7 +126,16 @@ A dedicated SQL pool consumes billable resources as long as it's active. You can
110126
111127
This query creates a table `dbo.PassengerCountStats` with aggregate data from the `trip_distance` field, then queries the new table. The data shows how the total trip distances and average trip distance relate to the number of passengers.
112128
1. In the SQL script result window, change the **View** to **Chart** to see a visualization of the results as a line chart. Change **Category column** to `PassengerCount`.
113-
129+
130+
## Clean up
131+
132+
Pause your dedicated SQL Pool to reduce costs.
133+
134+
1. Navigate to **Manage** in your synapse workspace.
135+
1. Select **SQL pools**.
136+
1. Hover over SQLPOOL1 and select the **Pause** button.
137+
1. Confirm to pause.
138+
114139
## Next step
115140
116141
> [!div class="nextstepaction"]

articles/synapse-analytics/spark/apache-spark-pool-configurations.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
title: Apache Spark pool concepts
33
description: Introduction to Apache Spark pool sizes and configurations in Azure Synapse Analytics.
4-
ms.topic: conceptual
4+
ms.topic: concept-article
55
ms.service: azure-synapse-analytics
66
ms.subservice: spark
77
ms.custom: references_regions
88
author: guyhay
99
ms.author: guyhay
1010
ms.reviewer: whhender
11-
ms.date: 09/07/2022
11+
ms.date: 12/06/2024
1212
---
1313

1414
# Apache Spark pool configurations in Azure Synapse Analytics
@@ -53,7 +53,7 @@ Autoscale for Apache Spark pools allows automatic scale up and down of compute r
5353
Apache Spark pools now support elastic pool storage. Elastic pool storage allows the Spark engine to monitor worker node temporary storage and attach extra disks if needed. Apache Spark pools utilize temporary disk storage while the pool is instantiated. Spark jobs write shuffle map outputs, shuffle data and spilled data to local VM disks. Examples of operations that could utilize local disk are sort, cache, and persist. When temporary VM disk space runs out, Spark jobs could fail due to “Out of Disk Space” error (java.io.IOException: No space left on device). With “Out of Disk Space” errors, much of the burden to prevent jobs from failing shifts to the customer to reconfigure the Spark jobs (for example, tweak the number of partitions) or clusters (for example, add more nodes to the cluster). These errors might not be consistent, and the user might end up experimenting heavily by running production jobs. This process can be expensive for the user in multiple dimensions:
5454

5555
* Wasted time. Customers are required to experiment heavily with job configurations via trial and error and are expected to understand Spark’s internal metrics to make the correct decision.
56-
* Wasted resources. Since production jobs can process varying amount of data, Spark jobs can fail non-deterministically if resources aren't over-provisioned. For instance, consider the problem of data skew, which could result in a few nodes requiring more disk space than others. Currently in Synapse, each node in a cluster gets the same size of disk space and increasing disk space across all nodes isn't an ideal solution and leads to tremendous waste.
56+
* Wasted resources. Since production jobs can process varying amount of data, Spark jobs can fail nondeterministically if resources aren't over-provisioned. For instance, consider the problem of data skew, which could result in a few nodes requiring more disk space than others. Currently in Synapse, each node in a cluster gets the same size of disk space and increasing disk space across all nodes isn't an ideal solution and leads to tremendous waste.
5757
* Slowdown in job execution. In the hypothetical scenario where we solve the problem by autoscaling nodes (assuming costs aren't an issue to the end customer), adding a compute node is still expensive (takes a few minutes) as opposed to adding storage (takes a few seconds).
5858

5959
No action is required by you, plus you should see fewer job failures as a result.
@@ -65,7 +65,7 @@ No action is required by you, plus you should see fewer job failures as a result
6565

6666
The automatic pause feature releases resources after a set idle period, reducing the overall cost of an Apache Spark pool. The number of minutes of idle time can be set once this feature is enabled. The automatic pause feature is independent of the autoscale feature. Resources can be paused whether the autoscale is enabled or disabled. This setting can be altered after pool creation although active sessions will need to be restarted.
6767

68-
## Next steps
68+
## Related content
6969

7070
* [Azure Synapse Analytics](../index.yml)
7171
* [Apache Spark Documentation](https://spark.apache.org/docs/3.2.1/)

articles/synapse-analytics/sql/create-use-external-tables.md

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ title: Create and use external tables in Synapse SQL pool
33
description: In this section, you'll learn how to create and use external tables in Synapse SQL pool.
44
author: vvasic-msft
55
ms.service: azure-synapse-analytics
6-
ms.topic: overview
6+
ms.topic: how-to
77
ms.subservice: sql
8-
ms.date: 02/02/2022
8+
ms.date: 12/11/2024
99
ms.author: vvasic
1010
ms.reviewer: whhender, wiassaf
1111
---
@@ -78,14 +78,12 @@ The queries in this article will be executed on your sample database and use the
7878

7979
## External table on a file
8080

81-
You can create external tables that access data on an Azure storage account that allows access to users with some Microsoft Entra identity or SAS key. You can create external tables the same way you create regular SQL Server external tables.
81+
You can create external tables that access data on an Azure storage account that allows access to users with some Microsoft Entra identity or SAS key. You can create external tables the same way you create regular SQL Server external tables.
8282

83-
The following query creates an external table that reads *population.csv* file from SynapseSQL demo Azure storage account that is referenced using `sqlondemanddemo` data source and protected with database scoped credential called `sqlondemand`.
84-
85-
Data source and database scoped credential are created in [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql).
83+
The following query creates an external table that reads *population.csv* file from SynapseSQL demo Azure storage account that is referenced using `sqlondemanddemo` data source and protected with database scoped credential called `sqlondemand`.
8684

8785
> [!NOTE]
88-
> Change the first line in the query, i.e., [mydbname], so you're using the database you created.
86+
> Change the first line in the query, i.e., [mydbname], so you're using the database you created.
8987
9088
```sql
9189
USE [mydbname];
@@ -128,15 +126,15 @@ CREATE EXTERNAL TABLE Taxi (
128126
);
129127
```
130128
131-
You can specify the pattern that the files must satisfy in order to be referenced by the external table. The pattern is required only for Parquet and CSV tables. If you are using Delta Lake format, you need to specify just a root folder, and the external table will automatically find the pattern.
129+
You can specify the pattern that the files must satisfy in order to be referenced by the external table. The pattern is required only for Parquet and CSV tables. If you're using Delta Lake format, you need to specify just a root folder, and the external table will automatically find the pattern.
132130
133131
> [!NOTE]
134132
> The table is created on partitioned folder structure, but you cannot leverage some partition elimination. If you want to get better performance by skipping the files that do not satisfy some criterion (like specific year or month in this case), use [views on external data](create-use-views.md#partitioned-views).
135133
136134
## External table on appendable files
137135
138-
The files that are referenced by an external table should not be changed while the query is running. In the long-running query, SQL pool may retry reads, read parts of the files, or even read the file multiple times. Changes of the file content would cause wrong results. Therefore, the SQL pool fails the query if detects that the modification time of any file is changed during the query execution.
139-
In some scenarios you might want to create a table on the files that are constantly appended. To avoid the query failures due to constantly appended files, you can specify that the external table should ignore potentially inconsistent reads using the `TABLE_OPTIONS` setting.
136+
The files that are referenced by an external table shouldn't be changed while the query is running. In the long-running query, SQL pool could retry reads, read parts of the files, or even read the file multiple times. Changes of the file content would cause wrong results. Therefore, the SQL pool fails the query if detects that the modification time of any file is changed during the query execution.
137+
In some scenarios, you might want to create a table on the files that are constantly appended. To avoid the query failures due to constantly appended files, you can specify that the external table should ignore potentially inconsistent reads using the `TABLE_OPTIONS` setting.
140138
141139
142140
```sql
@@ -155,7 +153,7 @@ WITH (
155153
);
156154
```
157155
158-
The `ALLOW_INCONSISTENT_READS` read option will disable file modification time check during the query lifecycle and read whatever is available in the files that are referenced by the external table. In appendable files, the existing content is not updated, and only new rows are added. Therefore, the probability of wrong results is minimized compared to the updateable files. This option might enable you to read the frequently appended files without handling the errors.
156+
The `ALLOW_INCONSISTENT_READS` read option will disable file modification time check during the query lifecycle and read whatever is available in the files that are referenced by the external table. In appendable files, the existing content isn't updated, and only new rows are added. Therefore, the probability of wrong results is minimized compared to the updateable files. This option might enable you to read the frequently appended files without handling the errors.
159157
160158
This option is available only in the external tables created on CSV file format.
161159
@@ -183,11 +181,11 @@ CREATE EXTERNAL TABLE Covid (
183181
);
184182
```
185183
186-
External tables cannot be created on a partitioned folder. Review the other known issues on [Synapse serverless SQL pool self-help page](resources-self-help-sql-on-demand.md#delta-lake).
184+
External tables can't be created on a partitioned folder. Review the other known issues on [Synapse serverless SQL pool self-help page](resources-self-help-sql-on-demand.md#delta-lake).
187185
188186
### Delta tables on partitioned folders
189187
190-
External tables in serverless SQL pools do not support partitioning on Delta Lake format. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) instead of tables if you have partitioned Delta Lake data sets.
188+
External tables in serverless SQL pools don't support partitioning on Delta Lake format. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) instead of tables if you have partitioned Delta Lake data sets.
191189
192190
> [!IMPORTANT]
193191
> Do not create external tables on partitioned Delta Lake folders even if you see that they might work in some cases. Using unsupported features like external tables on partitioned delta folders might cause issues or instability of the serverless pool. Azure support will not be able to resolve any issue if it is using tables on partitioned folders. You would be asked to transition to [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) and rewrite your code to use only the supported feature before proceeding with issue resolution.
@@ -216,6 +214,7 @@ ORDER BY
216214
217215
Performance of this query might vary depending on region. Your workspace might not be placed in the same region as the Azure storage accounts used in these samples. For production workloads, place your Synapse workspace and Azure storage in the same region.
218216
219-
## Next steps
217+
## Next step
220218
221-
For information on how to store results of a query to storage, refer to [Store query results to the storage](../sql/create-external-table-as-select.md) article.
219+
> [!div class="nextstepaction"]
220+
> [Store query results to the storage](../sql/create-external-table-as-select.md)

articles/synapse-analytics/sql/create-use-views.md

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ title: Create and use views in serverless SQL pool
33
description: In this section, you'll learn how to create and use views to wrap serverless SQL pool queries. Views will allow you to reuse those queries. Views are also needed if you want to use tools, such as Power BI, in conjunction with serverless SQL pool.
44
author: azaricstefan
55
ms.service: azure-synapse-analytics
6-
ms.topic: overview
6+
ms.topic: how-to
77
ms.subservice: sql
8-
ms.date: 05/20/2020
8+
ms.date: 12/06/2024
99
ms.author: stefanazaric
1010
ms.reviewer: whhender, wiassaf
1111
---
@@ -53,7 +53,7 @@ The view uses an `EXTERNAL DATA SOURCE` with a root URL of your storage, as a `D
5353

5454
### Delta Lake views
5555

56-
If you are creating the views on top of Delta Lake folder, you need to specify the location to the root folder after the `BULK` option instead of specifying the file path.
56+
If you're creating the views on top of Delta Lake folder, you need to specify the location to the root folder after the `BULK` option instead of specifying the file path.
5757

5858
> [!div class="mx-imgBorder"]
5959
>![ECDC COVID-19 Delta Lake folder](./media/shared/covid-delta-lake-studio.png)
@@ -100,7 +100,7 @@ When using JOINs in SQL queries, declare the filter predicate as NVARCHAR to red
100100

101101
### Delta Lake partitioned views
102102

103-
If you are creating the partitioned views on top of Delta Lake storage, you can specify just a root Delta Lake folder and don't need to explicitly expose the partitioning columns using the `FILEPATH` function:
103+
If you're creating the partitioned views on top of Delta Lake storage, you can specify just a root Delta Lake folder and don't need to explicitly expose the partitioning columns using the `FILEPATH` function:
104104

105105
```sql
106106
CREATE OR ALTER VIEW YellowTaxiView
@@ -124,7 +124,7 @@ For more information, review [Synapse serverless SQL pool self-help page](resour
124124

125125
## JSON views
126126

127-
The views are the good choice if you need to do some additional processing on top of the result set that is fetched from the files. One example might be parsing JSON files where we need to apply the JSON functions to extract the values from the JSON documents:
127+
The views are the good choice if you need to do some extra processing on top of the result set that is fetched from the files. One example might be parsing JSON files where we need to apply the JSON functions to extract the values from the JSON documents:
128128

129129
```sql
130130
CREATE OR ALTER VIEW CovidCases
@@ -191,12 +191,6 @@ ORDER BY
191191

192192
When you query the view, you may encounter errors or unexpected results. This probably means that the view references columns or objects that were modified or no longer exist. You need to manually adjust the view definition to align with the underlying schema changes.
193193

194-
## Next steps
194+
## Related content
195195

196196
For information on how to query different file types, refer to the [Query single CSV file](query-single-csv-file.md), [Query Parquet files](query-parquet-files.md), and [Query JSON files](query-json-files.md) articles.
197-
198-
- [What's new in Azure Synapse Analytics?](../whats-new.md).
199-
- [Best practices for serverless SQL pool in Azure Synapse Analytics](best-practices-serverless-sql-pool.md)
200-
- [Troubleshoot serverless SQL pool in Azure Synapse Analytics](resources-self-help-sql-on-demand.md)
201-
- [Troubleshoot a slow query on a dedicated SQL Pool](/troubleshoot/azure/synapse-analytics/dedicated-sql/troubleshoot-dsql-perf-slow-query)
202-
- [Synapse Studio troubleshooting](../troubleshoot/troubleshoot-synapse-studio.md)

0 commit comments

Comments
 (0)