20230215 synapse supported collations update, improved linkage

WilliamDAssafMSFT · WilliamDAssafMSFT · commit ff40fbf6fd94 · 2023-02-15T09:19:20.000-08:00
diff --git a/articles/synapse-analytics/get-started-analyze-sql-on-demand.md b/articles/synapse-analytics/get-started-analyze-sql-on-demand.md
@@ -3,12 +3,12 @@ title: 'Tutorial: Get started analyze data with a serverless SQL pool'
 description: In this tutorial, you'll learn how to analyze data with a serverless SQL pool using data located in Spark databases.
 author: saveenr
 ms.author: saveenr
-ms.reviewer: sngun
+ms.reviewer: sngun, wiassaf
 ms.service: synapse-analytics
 ms.subservice: sql
-ms.custom: ignite-2022
+ms.custom: 
 ms.topic: tutorial
-ms.date: 11/18/2022
+ms.date: 02/15/2023
 ---
 
 # Analyze data with a serverless SQL pool
@@ -39,7 +39,7 @@ Every workspace comes with a pre-configured serverless SQL pool called **Built-i
             FORMAT='PARQUET'
         ) AS [result]
     ```
-1. Click **Run**. 
+1. Select **Run**. 
 
 Data exploration is just a simplified scenario where you can understand the basic characteristics of your data. Learn more about data exploration and analysis in this [tutorial](sql/tutorial-data-analyst.md).
 
@@ -60,16 +60,15 @@ However, as you continue data exploration, you might want to create some utility
    ```
 
    > [!IMPORTANT]
-   > Use a collation with `_UTF8` suffix to ensure that UTF-8 text is properly converted to `VARCHAR` columns. `Latin1_General_100_BIN2_UTF8` provides 
-   > the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers.
+   > Use a collation with `_UTF8` suffix to ensure that UTF-8 text is properly converted to `VARCHAR` columns. `Latin1_General_100_BIN2_UTF8` provides the best performance in the queries that read data from Parquet files and Azure Cosmos DB containers. For more information on changing collations, refer to [Collation types supported for Synapse SQL](sql/reference-collation-types.md).
 
-1. Switch from master to `DataExplorationDB` using the following command. You can also use the UI control **use database** to switch your current database:
+1. Switch the database context from `master` to `DataExplorationDB` using the following command. You can also use the UI control **use database** to switch your current database:
 
    ```sql
    USE DataExplorationDB
    ```
 
-1. From the 'DataExplorationDB', create utility objects such as credentials and data sources.
+1. From `DataExplorationDB`, create utility objects such as credentials and data sources.
 
    ```sql
    CREATE EXTERNAL DATA SOURCE ContosoLake
@@ -79,13 +78,13 @@ However, as you continue data exploration, you might want to create some utility
    > [!NOTE]
    > An external data source can be created without a credential. If a credential does not exist, the caller's identity will be used to access the external data source.
 
-1. Optionally, use the newly created 'DataExplorationDB' database to create a login for a user in DataExplorationDB that will access external data:
+1. Optionally, use the newly created `DataExplorationDB` database to create a login for a user in `DataExplorationDB` that will access external data:
 
    ```sql
    CREATE LOGIN data_explorer WITH PASSWORD = 'My Very Strong Password 1234!';
    ```
 
-   Next create a database user in 'DataExplorationDB' for the above login and grant the `ADMINISTER DATABASE BULK OPERATIONS` permission.
+   Next create a database user in `DataExplorationDB` for the above login and grant the `ADMINISTER DATABASE BULK OPERATIONS` permission.
 
    ```sql
    CREATE USER data_explorer FOR LOGIN data_explorer;
@@ -109,7 +108,7 @@ However, as you continue data exploration, you might want to create some utility
 
 1. **Publish** your changes to the workspace.
 
-Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about building Logical Data Warehouse in this [tutorial](sql/tutorial-data-analyst.md).
+Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about [building a logical data warehouse in this tutorial](sql/tutorial-data-analyst.md).
 
 ## Next steps
 
diff --git a/articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-reference-collation-types.md b/articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-reference-collation-types.md
@@ -4,27 +4,29 @@ description: Collation types supported for dedicated SQL pool (formerly SQL DW)
 ms.service: synapse-analytics
 ms.subservice: sql
 ms.topic: conceptual
-ms.date: 12/04/2019
+ms.date: 02/15/2023
 author: WilliamDAssafMSFT 
 ms.author: wiassaf
-ms.reviewer: sngun
-ms.custom: seo-lt-2019, azure-synapse
+ms.reviewer: sngun, kecona
+ms.custom: azure-synapse
 ---
 
 # Database collation support for dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics 
 
 You can change the default database collation from the Azure portal when you create a new dedicated SQL pool (formerly SQL DW). This capability makes it even easier to create a new database using one of the 3800 supported database collations.
 
+This article applies to dedicated SQL pools (formerly SQL DW), for more information on dedicated SQL pools in Azure Synapse workspaces, see [Collation types supported for Synapse SQL](../sql/reference-collation-types.md).
+
 Collations provide the locale, code page, sort order and character sensitivity rules for character-based data types. Once chosen, all columns and expressions requiring collation information inherit the chosen collation from the database setting. The default inheritance can be overridden by explicitly stating a different collation for a character-based data type.
 
 > [!NOTE]
 > In Azure Synapse Analytics, query text (including variables, constants, etc.) is always handled using the database-level collation, and not the server-level collation as in other SQL Server offerings.
 
 ## Changing collation
 
-To change the default collation, update to the Collation field in the provisioning experience.
+To change the default collation, update to the **Collation** field in the provisioning experience.
 
-For example, if you wanted to change the default collation to case sensitive, you would simply rename the Collation from SQL_Latin1_General_CP1_CI_AS to SQL_Latin1_General_CP1_CS_AS.
+For example, if you wanted to change the default collation to case sensitive, change the collation from `SQL_Latin1_General_CP1_CI_AS` to `SQL_Latin1_General_CP1_CS_AS`.
 
 ## Collation support
 
@@ -47,4 +49,13 @@ To check the current collation for the database, you can run the following T-SQL
 SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS Collation;
 ```
 
-When passed 'Collation' as the property parameter, the DatabasePropertyEx function returns the current collation for the database specified. For more information, see [DatabasePropertyEx](/sql/t-sql/functions/databasepropertyex-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true).
+When passed 'Collation' as the property parameter, the DatabasePropertyEx function returns the current collation for the database specified. For more information, see [DATABASEPROPERTYEX](/sql/t-sql/functions/databasepropertyex-transact-sql?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json&view=azure-sqldw-latest&preserve-view=true).
+
+
+## Next steps
+
+Additional information on best practices for dedicated SQL pool and serverless SQL pool can be found in the following articles:
+
+- [Best Practices for dedicated SQL pool](./best-practices-dedicated-sql-pool.md)
+- [Best practices for serverless SQL pool](./best-practices-serverless-sql-pool.md)
+
diff --git a/articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md b/articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md
@@ -5,7 +5,7 @@ author: filippopovic
 ms.author: fipopovi
 manager: craigg
 ms.reviewer: sngun, wiassaf
-ms.date: 09/01/2022
+ms.date: 02/15/2023
 ms.service: synapse-analytics
 ms.subservice: sql
 ms.topic: conceptual
@@ -24,7 +24,7 @@ Some generic guidelines are:
 - Make sure the storage and serverless SQL pool are in the same region. Storage examples include Azure Data Lake Storage and Azure Cosmos DB.
 - Try to [optimize storage layout](#prepare-files-for-querying) by using partitioning and keeping your files in the range between 100 MB and 10 GB.
 - If you're returning a large number of results, make sure you're using SQL Server Management Studio or Azure Data Studio and not Azure Synapse Studio. Azure Synapse Studio is a web tool that isn't designed for large result sets.
-- If you're filtering results by string column, try to use a `BIN2_UTF8` collation.
+- If you're filtering results by string column, try to use a `BIN2_UTF8` collation. For more information on changing collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).
 - Consider caching the results on the client side by using Power BI import mode or Azure Analysis Services, and periodically refresh them. Serverless SQL pools can't provide an interactive experience in Power BI Direct Query mode if you're using complex queries or processing a large amount of data.
 
 ## Client applications and network connections
@@ -91,7 +91,7 @@ The data types you use in your query affect performance and concurrency. You can
 - Use the smallest data size that can accommodate the largest possible value.
   - If the maximum character value length is 30 characters, use a character data type of length 30.
   - If all character column values are of a fixed size, use **char** or **nchar**. Otherwise, use **varchar** or **nvarchar**.
-  - If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. You can find integer data type ranges in [this article](/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=azure-sqldw-latest&preserve-view=true).
+  - If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. For more information, see [integer data type ranges](/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=azure-sqldw-latest&preserve-view=true).
 - If possible, use **varchar** and **char** instead of **nvarchar** and **nchar**.
   - Use the **varchar** type with some UTF8 collation if you're reading data from Parquet, Azure Cosmos DB, Delta Lake, or CSV with UTF-8 encoding.
   - Use the **varchar** type without UTF8 collation if you're reading data from CSV non-Unicode files (for example, ASCII).
@@ -103,7 +103,7 @@ The data types you use in your query affect performance and concurrency. You can
 
 [Schema inference](query-parquet-files.md#automatic-schema-inference) helps you quickly write queries and explore data without knowing file schemas. The cost of this convenience is that inferred data types might be larger than the actual data types. This discrepancy happens when there isn't enough information in the source files to make sure the appropriate data type is used. For example, Parquet files don't contain metadata about maximum character column length. So serverless SQL pool infers it as varchar(8000).
 
-You can use [sp_describe_first_results_set](/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15&preserve-view=true) to check the resulting data types of your query.
+You can use the system stored procedure [sp_describe_first_results_set](/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15&preserve-view=true) to check the resulting data types of your query.
 
 The following example shows how you can optimize inferred data types. This procedure is used to show the inferred data types:
 
diff --git a/articles/synapse-analytics/sql/develop-tables-external-tables.md b/articles/synapse-analytics/sql/develop-tables-external-tables.md
@@ -6,9 +6,8 @@ ms.author: jovanpop
 ms.service: synapse-analytics
 ms.topic: overview
 ms.subservice: sql
-ms.date: 02/15/2022
+ms.date: 02/15/2023
 ms.reviewer: wiassaf
-ms.custom: ignite-fall-2021
 ---
 
 # Use external tables with Synapse SQL
@@ -27,7 +26,7 @@ The key differences between Hadoop and native external tables are presented in t
 | Serverless SQL pool | Not available | Available |
 | Supported formats | Delimited/CSV, Parquet, ORC, Hive RC, and RC | Serverless SQL pool: Delimited/CSV, Parquet, and [Delta Lake](query-delta-lake-format.md)<br/>Dedicated SQL pool: Parquet (preview) |
 | [Folder partition elimination](#folder-partition-elimination) | No | Partition elimination is available only in the partitioned tables created on Parquet or CSV formats that are synchronized from Apache Spark pools. You might create external tables on Parquet partitioned folders, but the partitioning columns will be inaccessible and ignored, while the partition elimination will not be applied. Do not create [external tables on Delta Lake folders](create-use-external-tables.md#delta-tables-on-partitioned-folders) because they are not supported. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) if you need to query partitioned Delta Lake data. |
-| [File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. |
+| [File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. For more information on collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).|
 | Custom format for location | No | Yes, using wildcards like `/year=*/month=*/day=*` for Parquet or CSV formats. Custom folder paths are not available in Delta Lake. In the serverless SQL pool you can also use recursive wildcards `/logs/**` to reference Parquet or CSV files in any sub-folder beneath the referenced folder. |
 | Recursive folder scan | Yes | Yes. In serverless SQL pools must be specified `/**` at the end of the location path. In Dedicated pool the folders are always scanned recursively. |
 | Storage authentication | Storage Access Key(SAK), AAD passthrough, Managed identity, Custom application Azure AD identity | [Shared Access Signature(SAS)](develop-storage-files-storage-access-control.md?tabs=shared-access-signature), [AAD passthrough](develop-storage-files-storage-access-control.md?tabs=user-identity), [Managed identity](develop-storage-files-storage-access-control.md?tabs=managed-identity), [Custom application Azure AD identity](develop-storage-files-storage-access-control.md?tabs=service-principal). |
@@ -60,14 +59,14 @@ You can create external tables in Synapse SQL pools via the following steps:
  
 ### Folder partition elimination
 
-The native external tables in Synapse pools are able to ignore the files placed in the folders that are not relevant for the queries. If your files are stored in a folder hierarchy (for example - **/year=2020/month=03/day=16**) and the values for **year**, **month**, and **day** are exposed as the columns, the queries that contain filters like `year=2020` will read the files only from the subfolders placed within the **year=2020** folder. The files and folders placed in other folders (**year=2021** or **year=2022**) will be ignored in this query. This elimination is known as **partition elimination**. 
+The native external tables in Synapse pools are able to ignore the files placed in the folders that are not relevant for the queries. If your files are stored in a folder hierarchy (for example - `/year=2020/month=03/day=16`) and the values for `year`, `month`, and `day` are exposed as the columns, the queries that contain filters like `year=2020` will read the files only from the subfolders placed within the `year=2020` folder. The files and folders placed in other folders (`year=2021` or `year=2022`) will be ignored in this query. This elimination is known as **partition elimination**. 
 
 The folder partition elimination is available in the native external tables that are synchronized from the Synapse Spark pools. If you have partitioned data set and you would like to leverage the partition elimination with the external tables that you create, use [the partitioned views](create-use-views.md#partitioned-views) instead of the external tables.
 
 ### File elimination
 
 Some data formats such as Parquet and Delta contain file statistics for each column (for example, min/max values for each column). The queries that filter data will not read the files where the required column values do not exist. The query will first explore min/max values for the columns used in the query predicate to find the files that do not contain the required data. These files will be ignored and eliminated from the query plan.
-This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To leverage filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation.
+This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To leverage filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation. For more information on collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).
 
 ### Security
 
@@ -76,11 +75,9 @@ External tables access underlying Azure storage using the database scoped creden
 - Data source without credential enables external tables to access publicly available files on Azure storage.
 - Data source can have a credential that enables external tables to access only the files on Azure storage using SAS token or workspace Managed Identity - For examples, see [the Develop storage files storage access control](develop-storage-files-storage-access-control.md#examples) article.
 
-
-
 ## CREATE EXTERNAL DATA SOURCE
 
-External data sources are used to connect to storage accounts. The complete documentation is outlined [here](/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&preserve-view=true).
+External data sources are used to connect to storage accounts. For more information, see [CREATE EXTERNAL DATA SOURCE](/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&preserve-view=true).
 
 ### Syntax for CREATE EXTERNAL DATA SOURCE
 
diff --git a/articles/synapse-analytics/sql/reference-collation-types.md b/articles/synapse-analytics/sql/reference-collation-types.md