You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/get-started-analyze-sql-on-demand.md
+10-11Lines changed: 10 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,12 +3,12 @@ title: 'Tutorial: Get started analyze data with a serverless SQL pool'
3
3
description: In this tutorial, you'll learn how to analyze data with a serverless SQL pool using data located in Spark databases.
4
4
author: saveenr
5
5
ms.author: saveenr
6
-
ms.reviewer: sngun
6
+
ms.reviewer: sngun, wiassaf
7
7
ms.service: synapse-analytics
8
8
ms.subservice: sql
9
-
ms.custom: ignite-2022
9
+
ms.custom:
10
10
ms.topic: tutorial
11
-
ms.date: 11/18/2022
11
+
ms.date: 02/15/2023
12
12
---
13
13
14
14
# Analyze data with a serverless SQL pool
@@ -39,7 +39,7 @@ Every workspace comes with a pre-configured serverless SQL pool called **Built-i
39
39
FORMAT='PARQUET'
40
40
) AS [result]
41
41
```
42
-
1. Click**Run**.
42
+
1. Select**Run**.
43
43
44
44
Data exploration is just a simplified scenario where you can understand the basic characteristics of your data. Learn more about data exploration and analysis in this [tutorial](sql/tutorial-data-analyst.md).
45
45
@@ -60,16 +60,15 @@ However, as you continue data exploration, you might want to create some utility
60
60
```
61
61
62
62
> [!IMPORTANT]
63
-
> Use a collation with `_UTF8` suffix to ensure that UTF-8 text is properly converted to `VARCHAR` columns. `Latin1_General_100_BIN2_UTF8` provides
64
-
> the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers.
63
+
> Use a collation with `_UTF8` suffix to ensure that UTF-8 text is properly converted to `VARCHAR` columns. `Latin1_General_100_BIN2_UTF8` provides the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers. For more information on changing collations, refer to [Collation types supported for Synapse SQL](sql/reference-collation-types.md).
65
64
66
-
1. Switch from master to `DataExplorationDB` using the following command. You can also use the UI control **use database** to switch your current database:
65
+
1. Switch the database context from `master` to `DataExplorationDB` using the following command. You can also use the UI control **use database** to switch your current database:
67
66
68
67
```sql
69
68
USE DataExplorationDB
70
69
```
71
70
72
-
1. From the 'DataExplorationDB', create utility objects such as credentials and data sources.
71
+
1. From `DataExplorationDB`, create utility objects such as credentials and data sources.
73
72
74
73
```sql
75
74
CREATE EXTERNAL DATA SOURCE ContosoLake
@@ -79,13 +78,13 @@ However, as you continue data exploration, you might want to create some utility
79
78
> [!NOTE]
80
79
> An external data source can be created without a credential. If a credential does not exist, the caller's identity will be used to access the external data source.
81
80
82
-
1. Optionally, use the newly created 'DataExplorationDB' database to create a login for a user in DataExplorationDB that will access external data:
81
+
1. Optionally, use the newly created `DataExplorationDB` database to create a login for a user in`DataExplorationDB` that will access external data:
83
82
84
83
```sql
85
84
CREATE LOGIN data_explorer WITH PASSWORD = 'My Very Strong Password 1234!';
86
85
```
87
86
88
-
Next create a database user in'DataExplorationDB' for the above login andgrant the `ADMINISTER DATABASE BULK OPERATIONS` permission.
87
+
Next create a database user in`DataExplorationDB` for the above login andgrant the `ADMINISTER DATABASE BULK OPERATIONS` permission.
89
88
90
89
```sql
91
90
CREATE USER data_explorer FOR LOGIN data_explorer;
@@ -109,7 +108,7 @@ However, as you continue data exploration, you might want to create some utility
109
108
110
109
1. **Publish** your changes to the workspace.
111
110
112
-
Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about building Logical Data Warehouse in this [tutorial](sql/tutorial-data-analyst.md).
111
+
Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about [building a logical data warehouse in this tutorial](sql/tutorial-data-analyst.md).
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-reference-collation-types.md
+17-6Lines changed: 17 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,27 +4,29 @@ description: Collation types supported for dedicated SQL pool (formerly SQL DW)
4
4
ms.service: synapse-analytics
5
5
ms.subservice: sql
6
6
ms.topic: conceptual
7
-
ms.date: 12/04/2019
7
+
ms.date: 02/15/2023
8
8
author: WilliamDAssafMSFT
9
9
ms.author: wiassaf
10
-
ms.reviewer: sngun
11
-
ms.custom: seo-lt-2019, azure-synapse
10
+
ms.reviewer: sngun, kecona
11
+
ms.custom: azure-synapse
12
12
---
13
13
14
14
# Database collation support for dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics
15
15
16
16
You can change the default database collation from the Azure portal when you create a new dedicated SQL pool (formerly SQL DW). This capability makes it even easier to create a new database using one of the 3800 supported database collations.
17
17
18
+
This article applies to dedicated SQL pools (formerly SQL DW), for more information on dedicated SQL pools in Azure Synapse workspaces, see [Collation types supported for Synapse SQL](../sql/reference-collation-types.md).
19
+
18
20
Collations provide the locale, code page, sort order and character sensitivity rules for character-based data types. Once chosen, all columns and expressions requiring collation information inherit the chosen collation from the database setting. The default inheritance can be overridden by explicitly stating a different collation for a character-based data type.
19
21
20
22
> [!NOTE]
21
23
> In Azure Synapse Analytics, query text (including variables, constants, etc.) is always handled using the database-level collation, and not the server-level collation as in other SQL Server offerings.
22
24
23
25
## Changing collation
24
26
25
-
To change the default collation, update to the Collation field in the provisioning experience.
27
+
To change the default collation, update to the **Collation** field in the provisioning experience.
26
28
27
-
For example, if you wanted to change the default collation to case sensitive, you would simply rename the Collation from SQL_Latin1_General_CP1_CI_AS to SQL_Latin1_General_CP1_CS_AS.
29
+
For example, if you wanted to change the default collation to case sensitive, change the collation from `SQL_Latin1_General_CP1_CI_AS` to `SQL_Latin1_General_CP1_CS_AS`.
28
30
29
31
## Collation support
30
32
@@ -47,4 +49,13 @@ To check the current collation for the database, you can run the following T-SQL
47
49
SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS Collation;
48
50
```
49
51
50
-
When passed 'Collation' as the property parameter, the DatabasePropertyEx function returns the current collation for the database specified. For more information, see [DatabasePropertyEx](/sql/t-sql/functions/databasepropertyex-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true).
52
+
When passed 'Collation' as the property parameter, the DatabasePropertyEx function returns the current collation for the database specified. For more information, see [DATABASEPROPERTYEX](/sql/t-sql/functions/databasepropertyex-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true).
53
+
54
+
55
+
## Next steps
56
+
57
+
Additional information on best practices for dedicated SQL pool and serverless SQL pool can be found in the following articles:
58
+
59
+
-[Best Practices for dedicated SQL pool](./best-practices-dedicated-sql-pool.md)
60
+
-[Best practices for serverless SQL pool](./best-practices-serverless-sql-pool.md)
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: filippopovic
5
5
ms.author: fipopovi
6
6
manager: craigg
7
7
ms.reviewer: sngun, wiassaf
8
-
ms.date: 09/01/2022
8
+
ms.date: 02/15/2023
9
9
ms.service: synapse-analytics
10
10
ms.subservice: sql
11
11
ms.topic: conceptual
@@ -24,7 +24,7 @@ Some generic guidelines are:
24
24
- Make sure the storage and serverless SQL pool are in the same region. Storage examples include Azure Data Lake Storage and Azure Cosmos DB.
25
25
- Try to [optimize storage layout](#prepare-files-for-querying) by using partitioning and keeping your files in the range between 100 MB and 10 GB.
26
26
- If you're returning a large number of results, make sure you're using SQL Server Management Studio or Azure Data Studio and not Azure Synapse Studio. Azure Synapse Studio is a web tool that isn't designed for large result sets.
27
-
- If you're filtering results by string column, try to use a `BIN2_UTF8` collation.
27
+
- If you're filtering results by string column, try to use a `BIN2_UTF8` collation. For more information on changing collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).
28
28
- Consider caching the results on the client side by using Power BI import mode or Azure Analysis Services, and periodically refresh them. Serverless SQL pools can't provide an interactive experience in Power BI Direct Query mode if you're using complex queries or processing a large amount of data.
29
29
30
30
## Client applications and network connections
@@ -91,7 +91,7 @@ The data types you use in your query affect performance and concurrency. You can
91
91
- Use the smallest data size that can accommodate the largest possible value.
92
92
- If the maximum character value length is 30 characters, use a character data type of length 30.
93
93
- If all character column values are of a fixed size, use **char** or **nchar**. Otherwise, use **varchar** or **nvarchar**.
94
-
- If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. You can find integer data type ranges in [this article](/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=azure-sqldw-latest&preserve-view=true).
94
+
- If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. For more information, see [integer data type ranges](/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=azure-sqldw-latest&preserve-view=true).
95
95
- If possible, use **varchar** and **char** instead of **nvarchar** and **nchar**.
96
96
- Use the **varchar** type with some UTF8 collation if you're reading data from Parquet, Azure Cosmos DB, Delta Lake, or CSV with UTF-8 encoding.
97
97
- Use the **varchar** type without UTF8 collation if you're reading data from CSV non-Unicode files (for example, ASCII).
@@ -103,7 +103,7 @@ The data types you use in your query affect performance and concurrency. You can
103
103
104
104
[Schema inference](query-parquet-files.md#automatic-schema-inference) helps you quickly write queries and explore data without knowing file schemas. The cost of this convenience is that inferred data types might be larger than the actual data types. This discrepancy happens when there isn't enough information in the source files to make sure the appropriate data type is used. For example, Parquet files don't contain metadata about maximum character column length. So serverless SQL pool infers it as varchar(8000).
105
105
106
-
You can use [sp_describe_first_results_set](/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15&preserve-view=true) to check the resulting data types of your query.
106
+
You can use the system stored procedure [sp_describe_first_results_set](/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15&preserve-view=true) to check the resulting data types of your query.
107
107
108
108
The following example shows how you can optimize inferred data types. This procedure is used to show the inferred data types:
|[Folder partition elimination](#folder-partition-elimination)| No | Partition elimination is available only in the partitioned tables created on Parquet or CSV formats that are synchronized from Apache Spark pools. You might create external tables on Parquet partitioned folders, but the partitioning columns will be inaccessible and ignored, while the partition elimination will not be applied. Do not create [external tables on Delta Lake folders](create-use-external-tables.md#delta-tables-on-partitioned-folders) because they are not supported. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) if you need to query partitioned Delta Lake data. |
30
-
|[File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. |
29
+
|[File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. For more information on collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).|
31
30
| Custom format for location | No | Yes, using wildcards like `/year=*/month=*/day=*` for Parquet or CSV formats. Custom folder paths are not available in Delta Lake. In the serverless SQL pool you can also use recursive wildcards `/logs/**` to reference Parquet or CSV files in any sub-folder beneath the referenced folder. |
32
31
| Recursive folder scan | Yes | Yes. In serverless SQL pools must be specified `/**` at the end of the location path. In Dedicated pool the folders are always scanned recursively. |
@@ -60,14 +59,14 @@ You can create external tables in Synapse SQL pools via the following steps:
60
59
61
60
### Folder partition elimination
62
61
63
-
The native external tables in Synapse pools are able to ignore the files placed in the folders that are not relevant for the queries. If your files are stored in a folder hierarchy (for example - **/year=2020/month=03/day=16**) and the values for **year**, **month**, and **day** are exposed as the columns, the queries that contain filters like `year=2020` will read the files only from the subfolders placed within the **year=2020** folder. The files and folders placed in other folders (**year=2021** or **year=2022**) will be ignored in this query. This elimination is known as **partition elimination**.
62
+
The native external tables in Synapse pools are able to ignore the files placed in the folders that are not relevant for the queries. If your files are stored in a folder hierarchy (for example - `/year=2020/month=03/day=16`) and the values for `year`, `month`, and `day` are exposed as the columns, the queries that contain filters like `year=2020` will read the files only from the subfolders placed within the `year=2020` folder. The files and folders placed in other folders (`year=2021` or `year=2022`) will be ignored in this query. This elimination is known as **partition elimination**.
64
63
65
64
The folder partition elimination is available in the native external tables that are synchronized from the Synapse Spark pools. If you have partitioned data set and you would like to leverage the partition elimination with the external tables that you create, use [the partitioned views](create-use-views.md#partitioned-views) instead of the external tables.
66
65
67
66
### File elimination
68
67
69
68
Some data formats such as Parquet and Delta contain file statistics for each column (for example, min/max values for each column). The queries that filter data will not read the files where the required column values do not exist. The query will first explore min/max values for the columns used in the query predicate to find the files that do not contain the required data. These files will be ignored and eliminated from the query plan.
70
-
This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To leverage filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation.
69
+
This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To leverage filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation. For more information on collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).
71
70
72
71
### Security
73
72
@@ -76,11 +75,9 @@ External tables access underlying Azure storage using the database scoped creden
76
75
- Data source without credential enables external tables to access publicly available files on Azure storage.
77
76
- Data source can have a credential that enables external tables to access only the files on Azure storage using SAS token or workspace Managed Identity - For examples, see [the Develop storage files storage access control](develop-storage-files-storage-access-control.md#examples) article.
78
77
79
-
80
-
81
78
## CREATE EXTERNAL DATA SOURCE
82
79
83
-
External data sources are used to connect to storage accounts. The complete documentation is outlined [here](/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&preserve-view=true).
80
+
External data sources are used to connect to storage accounts. For more information, see [CREATE EXTERNAL DATA SOURCE](/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&preserve-view=true).
0 commit comments