Skip to content

Commit ff40fbf

Browse files
20230215 synapse supported collations update, improved linkage
1 parent aa8f190 commit ff40fbf

File tree

5 files changed

+45
-33
lines changed

5 files changed

+45
-33
lines changed

articles/synapse-analytics/get-started-analyze-sql-on-demand.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ title: 'Tutorial: Get started analyze data with a serverless SQL pool'
33
description: In this tutorial, you'll learn how to analyze data with a serverless SQL pool using data located in Spark databases.
44
author: saveenr
55
ms.author: saveenr
6-
ms.reviewer: sngun
6+
ms.reviewer: sngun, wiassaf
77
ms.service: synapse-analytics
88
ms.subservice: sql
9-
ms.custom: ignite-2022
9+
ms.custom:
1010
ms.topic: tutorial
11-
ms.date: 11/18/2022
11+
ms.date: 02/15/2023
1212
---
1313

1414
# Analyze data with a serverless SQL pool
@@ -39,7 +39,7 @@ Every workspace comes with a pre-configured serverless SQL pool called **Built-i
3939
FORMAT='PARQUET'
4040
) AS [result]
4141
```
42-
1. Click **Run**.
42+
1. Select **Run**.
4343

4444
Data exploration is just a simplified scenario where you can understand the basic characteristics of your data. Learn more about data exploration and analysis in this [tutorial](sql/tutorial-data-analyst.md).
4545

@@ -60,16 +60,15 @@ However, as you continue data exploration, you might want to create some utility
6060
```
6161
6262
> [!IMPORTANT]
63-
> Use a collation with `_UTF8` suffix to ensure that UTF-8 text is properly converted to `VARCHAR` columns. `Latin1_General_100_BIN2_UTF8` provides
64-
> the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers.
63+
> Use a collation with `_UTF8` suffix to ensure that UTF-8 text is properly converted to `VARCHAR` columns. `Latin1_General_100_BIN2_UTF8` provides the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers. For more information on changing collations, refer to [Collation types supported for Synapse SQL](sql/reference-collation-types.md).
6564
66-
1. Switch from master to `DataExplorationDB` using the following command. You can also use the UI control **use database** to switch your current database:
65+
1. Switch the database context from `master` to `DataExplorationDB` using the following command. You can also use the UI control **use database** to switch your current database:
6766
6867
```sql
6968
USE DataExplorationDB
7069
```
7170
72-
1. From the 'DataExplorationDB', create utility objects such as credentials and data sources.
71+
1. From `DataExplorationDB`, create utility objects such as credentials and data sources.
7372
7473
```sql
7574
CREATE EXTERNAL DATA SOURCE ContosoLake
@@ -79,13 +78,13 @@ However, as you continue data exploration, you might want to create some utility
7978
> [!NOTE]
8079
> An external data source can be created without a credential. If a credential does not exist, the caller's identity will be used to access the external data source.
8180

82-
1. Optionally, use the newly created 'DataExplorationDB' database to create a login for a user in DataExplorationDB that will access external data:
81+
1. Optionally, use the newly created `DataExplorationDB` database to create a login for a user in `DataExplorationDB` that will access external data:
8382

8483
```sql
8584
CREATE LOGIN data_explorer WITH PASSWORD = 'My Very Strong Password 1234!';
8685
```
8786

88-
Next create a database user in 'DataExplorationDB' for the above login and grant the `ADMINISTER DATABASE BULK OPERATIONS` permission.
87+
Next create a database user in `DataExplorationDB` for the above login and grant the `ADMINISTER DATABASE BULK OPERATIONS` permission.
8988

9089
```sql
9190
CREATE USER data_explorer FOR LOGIN data_explorer;
@@ -109,7 +108,7 @@ However, as you continue data exploration, you might want to create some utility
109108

110109
1. **Publish** your changes to the workspace.
111110

112-
Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about building Logical Data Warehouse in this [tutorial](sql/tutorial-data-analyst.md).
111+
Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about [building a logical data warehouse in this tutorial](sql/tutorial-data-analyst.md).
113112

114113
## Next steps
115114

articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-reference-collation-types.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,29 @@ description: Collation types supported for dedicated SQL pool (formerly SQL DW)
44
ms.service: synapse-analytics
55
ms.subservice: sql
66
ms.topic: conceptual
7-
ms.date: 12/04/2019
7+
ms.date: 02/15/2023
88
author: WilliamDAssafMSFT
99
ms.author: wiassaf
10-
ms.reviewer: sngun
11-
ms.custom: seo-lt-2019, azure-synapse
10+
ms.reviewer: sngun, kecona
11+
ms.custom: azure-synapse
1212
---
1313

1414
# Database collation support for dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics
1515

1616
You can change the default database collation from the Azure portal when you create a new dedicated SQL pool (formerly SQL DW). This capability makes it even easier to create a new database using one of the 3800 supported database collations.
1717

18+
This article applies to dedicated SQL pools (formerly SQL DW), for more information on dedicated SQL pools in Azure Synapse workspaces, see [Collation types supported for Synapse SQL](../sql/reference-collation-types.md).
19+
1820
Collations provide the locale, code page, sort order and character sensitivity rules for character-based data types. Once chosen, all columns and expressions requiring collation information inherit the chosen collation from the database setting. The default inheritance can be overridden by explicitly stating a different collation for a character-based data type.
1921

2022
> [!NOTE]
2123
> In Azure Synapse Analytics, query text (including variables, constants, etc.) is always handled using the database-level collation, and not the server-level collation as in other SQL Server offerings.
2224
2325
## Changing collation
2426

25-
To change the default collation, update to the Collation field in the provisioning experience.
27+
To change the default collation, update to the **Collation** field in the provisioning experience.
2628

27-
For example, if you wanted to change the default collation to case sensitive, you would simply rename the Collation from SQL_Latin1_General_CP1_CI_AS to SQL_Latin1_General_CP1_CS_AS.
29+
For example, if you wanted to change the default collation to case sensitive, change the collation from `SQL_Latin1_General_CP1_CI_AS` to `SQL_Latin1_General_CP1_CS_AS`.
2830

2931
## Collation support
3032

@@ -47,4 +49,13 @@ To check the current collation for the database, you can run the following T-SQL
4749
SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS Collation;
4850
```
4951

50-
When passed 'Collation' as the property parameter, the DatabasePropertyEx function returns the current collation for the database specified. For more information, see [DatabasePropertyEx](/sql/t-sql/functions/databasepropertyex-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true).
52+
When passed 'Collation' as the property parameter, the DatabasePropertyEx function returns the current collation for the database specified. For more information, see [DATABASEPROPERTYEX](/sql/t-sql/functions/databasepropertyex-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true).
53+
54+
55+
## Next steps
56+
57+
Additional information on best practices for dedicated SQL pool and serverless SQL pool can be found in the following articles:
58+
59+
- [Best Practices for dedicated SQL pool](./best-practices-dedicated-sql-pool.md)
60+
- [Best practices for serverless SQL pool](./best-practices-serverless-sql-pool.md)
61+

articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: filippopovic
55
ms.author: fipopovi
66
manager: craigg
77
ms.reviewer: sngun, wiassaf
8-
ms.date: 09/01/2022
8+
ms.date: 02/15/2023
99
ms.service: synapse-analytics
1010
ms.subservice: sql
1111
ms.topic: conceptual
@@ -24,7 +24,7 @@ Some generic guidelines are:
2424
- Make sure the storage and serverless SQL pool are in the same region. Storage examples include Azure Data Lake Storage and Azure Cosmos DB.
2525
- Try to [optimize storage layout](#prepare-files-for-querying) by using partitioning and keeping your files in the range between 100 MB and 10 GB.
2626
- If you're returning a large number of results, make sure you're using SQL Server Management Studio or Azure Data Studio and not Azure Synapse Studio. Azure Synapse Studio is a web tool that isn't designed for large result sets.
27-
- If you're filtering results by string column, try to use a `BIN2_UTF8` collation.
27+
- If you're filtering results by string column, try to use a `BIN2_UTF8` collation. For more information on changing collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).
2828
- Consider caching the results on the client side by using Power BI import mode or Azure Analysis Services, and periodically refresh them. Serverless SQL pools can't provide an interactive experience in Power BI Direct Query mode if you're using complex queries or processing a large amount of data.
2929

3030
## Client applications and network connections
@@ -91,7 +91,7 @@ The data types you use in your query affect performance and concurrency. You can
9191
- Use the smallest data size that can accommodate the largest possible value.
9292
- If the maximum character value length is 30 characters, use a character data type of length 30.
9393
- If all character column values are of a fixed size, use **char** or **nchar**. Otherwise, use **varchar** or **nvarchar**.
94-
- If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. You can find integer data type ranges in [this article](/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=azure-sqldw-latest&preserve-view=true).
94+
- If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. For more information, see [integer data type ranges](/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=azure-sqldw-latest&preserve-view=true).
9595
- If possible, use **varchar** and **char** instead of **nvarchar** and **nchar**.
9696
- Use the **varchar** type with some UTF8 collation if you're reading data from Parquet, Azure Cosmos DB, Delta Lake, or CSV with UTF-8 encoding.
9797
- Use the **varchar** type without UTF8 collation if you're reading data from CSV non-Unicode files (for example, ASCII).
@@ -103,7 +103,7 @@ The data types you use in your query affect performance and concurrency. You can
103103

104104
[Schema inference](query-parquet-files.md#automatic-schema-inference) helps you quickly write queries and explore data without knowing file schemas. The cost of this convenience is that inferred data types might be larger than the actual data types. This discrepancy happens when there isn't enough information in the source files to make sure the appropriate data type is used. For example, Parquet files don't contain metadata about maximum character column length. So serverless SQL pool infers it as varchar(8000).
105105

106-
You can use [sp_describe_first_results_set](/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15&preserve-view=true) to check the resulting data types of your query.
106+
You can use the system stored procedure [sp_describe_first_results_set](/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15&preserve-view=true) to check the resulting data types of your query.
107107

108108
The following example shows how you can optimize inferred data types. This procedure is used to show the inferred data types:
109109

articles/synapse-analytics/sql/develop-tables-external-tables.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@ ms.author: jovanpop
66
ms.service: synapse-analytics
77
ms.topic: overview
88
ms.subservice: sql
9-
ms.date: 02/15/2022
9+
ms.date: 02/15/2023
1010
ms.reviewer: wiassaf
11-
ms.custom: ignite-fall-2021
1211
---
1312

1413
# Use external tables with Synapse SQL
@@ -27,7 +26,7 @@ The key differences between Hadoop and native external tables are presented in t
2726
| Serverless SQL pool | Not available | Available |
2827
| Supported formats | Delimited/CSV, Parquet, ORC, Hive RC, and RC | Serverless SQL pool: Delimited/CSV, Parquet, and [Delta Lake](query-delta-lake-format.md)<br/>Dedicated SQL pool: Parquet (preview) |
2928
| [Folder partition elimination](#folder-partition-elimination) | No | Partition elimination is available only in the partitioned tables created on Parquet or CSV formats that are synchronized from Apache Spark pools. You might create external tables on Parquet partitioned folders, but the partitioning columns will be inaccessible and ignored, while the partition elimination will not be applied. Do not create [external tables on Delta Lake folders](create-use-external-tables.md#delta-tables-on-partitioned-folders) because they are not supported. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) if you need to query partitioned Delta Lake data. |
30-
| [File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. |
29+
| [File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. For more information on collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).|
3130
| Custom format for location | No | Yes, using wildcards like `/year=*/month=*/day=*` for Parquet or CSV formats. Custom folder paths are not available in Delta Lake. In the serverless SQL pool you can also use recursive wildcards `/logs/**` to reference Parquet or CSV files in any sub-folder beneath the referenced folder. |
3231
| Recursive folder scan | Yes | Yes. In serverless SQL pools must be specified `/**` at the end of the location path. In Dedicated pool the folders are always scanned recursively. |
3332
| Storage authentication | Storage Access Key(SAK), AAD passthrough, Managed identity, Custom application Azure AD identity | [Shared Access Signature(SAS)](develop-storage-files-storage-access-control.md?tabs=shared-access-signature), [AAD passthrough](develop-storage-files-storage-access-control.md?tabs=user-identity), [Managed identity](develop-storage-files-storage-access-control.md?tabs=managed-identity), [Custom application Azure AD identity](develop-storage-files-storage-access-control.md?tabs=service-principal). |
@@ -60,14 +59,14 @@ You can create external tables in Synapse SQL pools via the following steps:
6059

6160
### Folder partition elimination
6261

63-
The native external tables in Synapse pools are able to ignore the files placed in the folders that are not relevant for the queries. If your files are stored in a folder hierarchy (for example - **/year=2020/month=03/day=16**) and the values for **year**, **month**, and **day** are exposed as the columns, the queries that contain filters like `year=2020` will read the files only from the subfolders placed within the **year=2020** folder. The files and folders placed in other folders (**year=2021** or **year=2022**) will be ignored in this query. This elimination is known as **partition elimination**.
62+
The native external tables in Synapse pools are able to ignore the files placed in the folders that are not relevant for the queries. If your files are stored in a folder hierarchy (for example - `/year=2020/month=03/day=16`) and the values for `year`, `month`, and `day` are exposed as the columns, the queries that contain filters like `year=2020` will read the files only from the subfolders placed within the `year=2020` folder. The files and folders placed in other folders (`year=2021` or `year=2022`) will be ignored in this query. This elimination is known as **partition elimination**.
6463

6564
The folder partition elimination is available in the native external tables that are synchronized from the Synapse Spark pools. If you have partitioned data set and you would like to leverage the partition elimination with the external tables that you create, use [the partitioned views](create-use-views.md#partitioned-views) instead of the external tables.
6665

6766
### File elimination
6867

6968
Some data formats such as Parquet and Delta contain file statistics for each column (for example, min/max values for each column). The queries that filter data will not read the files where the required column values do not exist. The query will first explore min/max values for the columns used in the query predicate to find the files that do not contain the required data. These files will be ignored and eliminated from the query plan.
70-
This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To leverage filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation.
69+
This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To leverage filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation. For more information on collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).
7170

7271
### Security
7372

@@ -76,11 +75,9 @@ External tables access underlying Azure storage using the database scoped creden
7675
- Data source without credential enables external tables to access publicly available files on Azure storage.
7776
- Data source can have a credential that enables external tables to access only the files on Azure storage using SAS token or workspace Managed Identity - For examples, see [the Develop storage files storage access control](develop-storage-files-storage-access-control.md#examples) article.
7877

79-
80-
8178
## CREATE EXTERNAL DATA SOURCE
8279

83-
External data sources are used to connect to storage accounts. The complete documentation is outlined [here](/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&preserve-view=true).
80+
External data sources are used to connect to storage accounts. For more information, see [CREATE EXTERNAL DATA SOURCE](/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&preserve-view=true).
8481

8582
### Syntax for CREATE EXTERNAL DATA SOURCE
8683

0 commit comments

Comments
 (0)