Skip to content

Commit e9982f3

Browse files
Merge pull request #227460 from WilliamDAssafMSFT/20230215-collation
20230215 collation update and linkage improvements
2 parents c433f97 + 3621742 commit e9982f3

7 files changed

+92
-194
lines changed

articles/synapse-analytics/get-started-analyze-sql-on-demand.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ title: 'Tutorial: Get started analyze data with a serverless SQL pool'
33
description: In this tutorial, you'll learn how to analyze data with a serverless SQL pool using data located in Spark databases.
44
author: saveenr
55
ms.author: saveenr
6-
ms.reviewer: sngun
6+
ms.reviewer: sngun, wiassaf
77
ms.service: synapse-analytics
88
ms.subservice: sql
9-
ms.custom: ignite-2022
9+
ms.custom:
1010
ms.topic: tutorial
11-
ms.date: 11/18/2022
11+
ms.date: 02/15/2023
1212
---
1313

1414
# Analyze data with a serverless SQL pool
@@ -39,7 +39,7 @@ Every workspace comes with a pre-configured serverless SQL pool called **Built-i
3939
FORMAT='PARQUET'
4040
) AS [result]
4141
```
42-
1. Click **Run**.
42+
1. Select **Run**.
4343

4444
Data exploration is just a simplified scenario where you can understand the basic characteristics of your data. Learn more about data exploration and analysis in this [tutorial](sql/tutorial-data-analyst.md).
4545

@@ -60,16 +60,15 @@ However, as you continue data exploration, you might want to create some utility
6060
```
6161
6262
> [!IMPORTANT]
63-
> Use a collation with `_UTF8` suffix to ensure that UTF-8 text is properly converted to `VARCHAR` columns. `Latin1_General_100_BIN2_UTF8` provides
64-
> the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers.
63+
> Use a collation with `_UTF8` suffix to ensure that UTF-8 text is properly converted to `VARCHAR` columns. `Latin1_General_100_BIN2_UTF8` provides the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers. For more information on changing collations, refer to [Collation types supported for Synapse SQL](sql/reference-collation-types.md).
6564
66-
1. Switch from master to `DataExplorationDB` using the following command. You can also use the UI control **use database** to switch your current database:
65+
1. Switch the database context from `master` to `DataExplorationDB` using the following command. You can also use the UI control **use database** to switch your current database:
6766
6867
```sql
6968
USE DataExplorationDB
7069
```
7170
72-
1. From the 'DataExplorationDB', create utility objects such as credentials and data sources.
71+
1. From `DataExplorationDB`, create utility objects such as credentials and data sources.
7372
7473
```sql
7574
CREATE EXTERNAL DATA SOURCE ContosoLake
@@ -79,13 +78,13 @@ However, as you continue data exploration, you might want to create some utility
7978
> [!NOTE]
8079
> An external data source can be created without a credential. If a credential does not exist, the caller's identity will be used to access the external data source.
8180

82-
1. Optionally, use the newly created 'DataExplorationDB' database to create a login for a user in DataExplorationDB that will access external data:
81+
1. Optionally, use the newly created `DataExplorationDB` database to create a login for a user in `DataExplorationDB` that will access external data:
8382

8483
```sql
8584
CREATE LOGIN data_explorer WITH PASSWORD = 'My Very Strong Password 1234!';
8685
```
8786

88-
Next create a database user in 'DataExplorationDB' for the above login and grant the `ADMINISTER DATABASE BULK OPERATIONS` permission.
87+
Next create a database user in `DataExplorationDB` for the above login and grant the `ADMINISTER DATABASE BULK OPERATIONS` permission.
8988

9089
```sql
9190
CREATE USER data_explorer FOR LOGIN data_explorer;
@@ -109,7 +108,7 @@ However, as you continue data exploration, you might want to create some utility
109108

110109
1. **Publish** your changes to the workspace.
111110

112-
Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about building Logical Data Warehouse in this [tutorial](sql/tutorial-data-analyst.md).
111+
Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about [building a logical data warehouse in this tutorial](sql/tutorial-data-analyst.md).
113112

114113
## Next steps
115114

articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-reference-collation-types.md

Lines changed: 30 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -4,100 +4,42 @@ description: Collation types supported for dedicated SQL pool (formerly SQL DW)
44
ms.service: synapse-analytics
55
ms.subservice: sql
66
ms.topic: conceptual
7-
ms.date: 12/04/2019
7+
ms.date: 02/15/2023
88
author: WilliamDAssafMSFT
99
ms.author: wiassaf
10-
ms.reviewer: sngun
11-
ms.custom: seo-lt-2019, azure-synapse
10+
ms.reviewer: sngun, kecona
11+
ms.custom: azure-synapse
1212
---
1313

1414
# Database collation support for dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics
1515

1616
You can change the default database collation from the Azure portal when you create a new dedicated SQL pool (formerly SQL DW). This capability makes it even easier to create a new database using one of the 3800 supported database collations.
1717

18+
This article applies to dedicated SQL pools (formerly SQL DW), for more information on dedicated SQL pools in Azure Synapse workspaces, see [Collation types supported for Synapse SQL](../sql/reference-collation-types.md).
19+
1820
Collations provide the locale, code page, sort order and character sensitivity rules for character-based data types. Once chosen, all columns and expressions requiring collation information inherit the chosen collation from the database setting. The default inheritance can be overridden by explicitly stating a different collation for a character-based data type.
1921

2022
> [!NOTE]
2123
> In Azure Synapse Analytics, query text (including variables, constants, etc.) is always handled using the database-level collation, and not the server-level collation as in other SQL Server offerings.
2224
2325
## Changing collation
2426

25-
To change the default collation, update to the Collation field in the provisioning experience.
26-
27-
For example, if you wanted to change the default collation to case sensitive, you would simply rename the Collation from SQL_Latin1_General_CP1_CI_AS to SQL_Latin1_General_CP1_CS_AS.
28-
29-
## List of unsupported collation types
30-
31-
* Japanese_Bushu_Kakusu_140_BIN
32-
* Japanese_Bushu_Kakusu_140_BIN2
33-
* Japanese_Bushu_Kakusu_140_CI_AI_VSS
34-
* Japanese_Bushu_Kakusu_140_CI_AI_WS_VSS
35-
* Japanese_Bushu_Kakusu_140_CI_AI_KS_VSS
36-
* Japanese_Bushu_Kakusu_140_CI_AI_KS_WS_VSS
37-
* Japanese_Bushu_Kakusu_140_CI_AS_VSS
38-
* Japanese_Bushu_Kakusu_140_CI_AS_WS_VSS
39-
* Japanese_Bushu_Kakusu_140_CI_AS_KS_VSS
40-
* Japanese_Bushu_Kakusu_140_CI_AS_KS_WS_VSS
41-
* Japanese_Bushu_Kakusu_140_CS_AI_VSS
42-
* Japanese_Bushu_Kakusu_140_CS_AI_WS_VSS
43-
* Japanese_Bushu_Kakusu_140_CS_AI_KS_VSS
44-
* Japanese_Bushu_Kakusu_140_CS_AI_KS_WS_VSS
45-
* Japanese_Bushu_Kakusu_140_CS_AS_VSS
46-
* Japanese_Bushu_Kakusu_140_CS_AS_WS_VSS
47-
* Japanese_Bushu_Kakusu_140_CS_AS_KS_VSS
48-
* Japanese_Bushu_Kakusu_140_CS_AS_KS_WS_VSS
49-
* Japanese_Bushu_Kakusu_140_CI_AI
50-
* Japanese_Bushu_Kakusu_140_CI_AI_WS
51-
* Japanese_Bushu_Kakusu_140_CI_AI_KS
52-
* Japanese_Bushu_Kakusu_140_CI_AI_KS_WS
53-
* Japanese_Bushu_Kakusu_140_CI_AS
54-
* Japanese_Bushu_Kakusu_140_CI_AS_WS
55-
* Japanese_Bushu_Kakusu_140_CI_AS_KS
56-
* Japanese_Bushu_Kakusu_140_CI_AS_KS_WS
57-
* Japanese_Bushu_Kakusu_140_CS_AI
58-
* Japanese_Bushu_Kakusu_140_CS_AI_WS
59-
* Japanese_Bushu_Kakusu_140_CS_AI_KS
60-
* Japanese_Bushu_Kakusu_140_CS_AI_KS_WS
61-
* Japanese_Bushu_Kakusu_140_CS_AS
62-
* Japanese_Bushu_Kakusu_140_CS_AS_WS
63-
* Japanese_Bushu_Kakusu_140_CS_AS_KS
64-
* Japanese_Bushu_Kakusu_140_CS_AS_KS_WS
65-
* Japanese_XJIS_140_BIN
66-
* Japanese_XJIS_140_BIN2
67-
* Japanese_XJIS_140_CI_AI_VSS
68-
* Japanese_XJIS_140_CI_AI_WS_VSS
69-
* Japanese_XJIS_140_CI_AI_KS_VSS
70-
* Japanese_XJIS_140_CI_AI_KS_WS_VSS
71-
* Japanese_XJIS_140_CI_AS_VSS
72-
* Japanese_XJIS_140_CI_AS_WS_VSS
73-
* Japanese_XJIS_140_CI_AS_KS_VSS
74-
* Japanese_XJIS_140_CI_AS_KS_WS_VSS
75-
* Japanese_XJIS_140_CS_AI_VSS
76-
* Japanese_XJIS_140_CS_AI_WS_VSS
77-
* Japanese_XJIS_140_CS_AI_KS_VSS
78-
* Japanese_XJIS_140_CS_AI_KS_WS_VSS
79-
* Japanese_XJIS_140_CS_AS_VSS
80-
* Japanese_XJIS_140_CS_AS_WS_VSS
81-
* Japanese_XJIS_140_CS_AS_KS_VSS
82-
* Japanese_XJIS_140_CS_AS_KS_WS_VSS
83-
* Japanese_XJIS_140_CI_AI
84-
* Japanese_XJIS_140_CI_AI_WS
85-
* Japanese_XJIS_140_CI_AI_KS
86-
* Japanese_XJIS_140_CI_AI_KS_WS
87-
* Japanese_XJIS_140_CI_AS
88-
* Japanese_XJIS_140_CI_AS_WS
89-
* Japanese_XJIS_140_CI_AS_KS
90-
* Japanese_XJIS_140_CI_AS_KS_WS
91-
* Japanese_XJIS_140_CS_AI
92-
* Japanese_XJIS_140_CS_AI_WS
93-
* Japanese_XJIS_140_CS_AI_KS
94-
* Japanese_XJIS_140_CS_AI_KS_WS
95-
* Japanese_XJIS_140_CS_AS
96-
* Japanese_XJIS_140_CS_AS_WS
97-
* Japanese_XJIS_140_CS_AS_KS
98-
* Japanese_XJIS_140_CS_AS_KS_WS
99-
* SQL_EBCDIC1141_CP1_CS_AS
100-
* SQL_EBCDIC277_2_CP1_CS_AS
27+
To change the default collation, update to the **Collation** field in the provisioning experience.
28+
29+
For example, if you wanted to change the default collation to case sensitive, change the collation from `SQL_Latin1_General_CP1_CI_AS` to `SQL_Latin1_General_CP1_CS_AS`.
30+
31+
## Collation support
32+
33+
The following table shows which collation types are supported by which service.
34+
35+
| Collation Type | Serverless SQL Pool | Dedicated SQL Pool - Database & Column Level | Dedicated SQL Pool - External Table (Native Support) | Dedicated SQL Pool - External Table (Hadoop/Polybase) |
36+
|:-----------------------------------------:|:-------------------:|:-----------------------:|:------------------:|:------------------:|
37+
| Non-UTF-8 Collations | Yes | Yes | Yes | Yes |
38+
| UTF-8 | Yes | Yes | No | No |
39+
| Japanese_Bushu_Kakusu_140_* | Yes | Yes | No | No |
40+
| Japanese_XJIS_140_* | Yes | Yes | No | No |
41+
| SQL_EBCDIC1141_CP1_CS_AS | No | No | No | No |
42+
| SQL_EBCDIC277_2_CP1_CS_AS | No | No | No | No |
10143

10244
## Checking the current collation
10345

@@ -107,4 +49,12 @@ To check the current collation for the database, you can run the following T-SQL
10749
SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS Collation;
10850
```
10951

110-
When passed 'Collation' as the property parameter, the DatabasePropertyEx function returns the current collation for the database specified. For more information, see [DatabasePropertyEx](/sql/t-sql/functions/databasepropertyex-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true).
52+
When passed 'Collation' as the property parameter, the DatabasePropertyEx function returns the current collation for the database specified. For more information, see [DATABASEPROPERTYEX](/sql/t-sql/functions/databasepropertyex-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true).
53+
54+
55+
## Next steps
56+
57+
Additional information on best practices for dedicated SQL pool and serverless SQL pool can be found in the following articles:
58+
59+
- [Best practices for dedicated SQL pool](../sql/best-practices-dedicated-sql-pool.md)
60+
- [Best practices for serverless SQL pool](../sql/best-practices-serverless-sql-pool.md)

articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: filippopovic
55
ms.author: fipopovi
66
manager: craigg
77
ms.reviewer: sngun, wiassaf
8-
ms.date: 09/01/2022
8+
ms.date: 02/15/2023
99
ms.service: synapse-analytics
1010
ms.subservice: sql
1111
ms.topic: conceptual
@@ -24,7 +24,7 @@ Some generic guidelines are:
2424
- Make sure the storage and serverless SQL pool are in the same region. Storage examples include Azure Data Lake Storage and Azure Cosmos DB.
2525
- Try to [optimize storage layout](#prepare-files-for-querying) by using partitioning and keeping your files in the range between 100 MB and 10 GB.
2626
- If you're returning a large number of results, make sure you're using SQL Server Management Studio or Azure Data Studio and not Azure Synapse Studio. Azure Synapse Studio is a web tool that isn't designed for large result sets.
27-
- If you're filtering results by string column, try to use a `BIN2_UTF8` collation.
27+
- If you're filtering results by string column, try to use a `BIN2_UTF8` collation. For more information on changing collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).
2828
- Consider caching the results on the client side by using Power BI import mode or Azure Analysis Services, and periodically refresh them. Serverless SQL pools can't provide an interactive experience in Power BI Direct Query mode if you're using complex queries or processing a large amount of data.
2929

3030
## Client applications and network connections
@@ -91,7 +91,7 @@ The data types you use in your query affect performance and concurrency. You can
9191
- Use the smallest data size that can accommodate the largest possible value.
9292
- If the maximum character value length is 30 characters, use a character data type of length 30.
9393
- If all character column values are of a fixed size, use **char** or **nchar**. Otherwise, use **varchar** or **nvarchar**.
94-
- If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. You can find integer data type ranges in [this article](/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=azure-sqldw-latest&preserve-view=true).
94+
- If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. For more information, see [integer data type ranges](/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=azure-sqldw-latest&preserve-view=true).
9595
- If possible, use **varchar** and **char** instead of **nvarchar** and **nchar**.
9696
- Use the **varchar** type with some UTF8 collation if you're reading data from Parquet, Azure Cosmos DB, Delta Lake, or CSV with UTF-8 encoding.
9797
- Use the **varchar** type without UTF8 collation if you're reading data from CSV non-Unicode files (for example, ASCII).
@@ -103,7 +103,7 @@ The data types you use in your query affect performance and concurrency. You can
103103

104104
[Schema inference](query-parquet-files.md#automatic-schema-inference) helps you quickly write queries and explore data without knowing file schemas. The cost of this convenience is that inferred data types might be larger than the actual data types. This discrepancy happens when there isn't enough information in the source files to make sure the appropriate data type is used. For example, Parquet files don't contain metadata about maximum character column length. So serverless SQL pool infers it as varchar(8000).
105105

106-
You can use [sp_describe_first_results_set](/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15&preserve-view=true) to check the resulting data types of your query.
106+
You can use the system stored procedure [sp_describe_first_results_set](/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15&preserve-view=true) to check the resulting data types of your query.
107107

108108
The following example shows how you can optimize inferred data types. This procedure is used to show the inferred data types:
109109

0 commit comments

Comments
 (0)