Skip to content

Commit 9ecd61d

Browse files
20250227 edit pass
1 parent b6f8f2e commit 9ecd61d

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

articles/synapse-analytics/troubleshoot/reading-utf8-text.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ ms.author: wiassaf
66
ms.topic: troubleshooting
77
ms.service: azure-synapse-analytics
88
ms.subservice: troubleshooting
9-
ms.date: 12/03/2020
9+
ms.date: 02/27/2025
1010
---
1111

1212
# Troubleshoot reading UTF-8 text from CSV or Parquet files using serverless SQL pool in Azure Synapse Analytics
1313

1414
This article provides troubleshooting steps for reading UTF-8 text from CSV or Parquet files using serverless SQL pool in Azure Synapse Analytics.
1515

16-
When UTF-8 text is read from a CSV or PARQUET file using serverless SQL pool, some special characters like ü and ö are incorrectly converted if the query returns VARCHAR columns with non-UTF8 collations. This is a known issue in SQL Server and Azure SQL. Non-UTF8 collation is the default in Synapse SQL so customer queries will be affected. Customers who use standard English characters and some subset of extended Latin characters may not notice the conversion errors. The incorrect conversion is explained in more detail in [Always use UTF-8 collations to read UTF-8 text in serverless SQL pool](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/always-use-utf-8-collations-to-read-utf-8-text-in-serverless-sql/ba-p/1883633)
16+
When UTF-8 text is read from a CSV or PARQUET file using serverless SQL pool, some special characters like `ü` and `ö` are incorrectly converted if the query returns **varchar** columns with non-UTF8 collations. This is a known issue in SQL Server and Azure SQL. Non-UTF8 collation is the default in Synapse SQL so customer queries will be affected. Customers who use standard English characters and some subset of extended Latin characters might not notice the conversion errors. The incorrect conversion is explained in more detail in [Always use UTF-8 collations to read UTF-8 text in serverless SQL pool](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/always-use-utf-8-collations-to-read-utf-8-text-in-serverless-sql/ba-p/1883633).
1717

1818
## Workaround
1919

@@ -26,7 +26,7 @@ The workaround to this issue is to always use UTF-8 collation when reading UTF-8
2626
COLLATE Latin1_General_100_BIN2_UTF8;
2727
```
2828

29-
- You can explicitly define collation on VARCHAR column in OPENROWSET or external table:
29+
- You can explicitly define collation on **varchar** column in OPENROWSET or external table:
3030

3131
```sql
3232
select geo_id, cases = sum(cases)
@@ -37,10 +37,9 @@ The workaround to this issue is to always use UTF-8 collation when reading UTF-8
3737
group by geo_id
3838
```
3939

40-
- If you did not specify UTF8 collation on external tables that read UTF8 data, you need to re-create impacted external tables and set UTF8 collation on VARCHAR columns (metadata operation).
40+
- If you didn't specify UTF8 collation on external tables that read UTF8 data, you need to re-create impacted external tables and set UTF8 collation on **varchar** columns (metadata operation).
4141

42-
43-
## Next steps
42+
## Related content
4443

4544
* [Query Parquet files with Synapse SQL](../sql/query-parquet-files.md)
4645
* [Query CSV files with Synapse SQL](../sql/query-single-csv-file.md)

0 commit comments

Comments
 (0)