You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/troubleshoot/reading-utf8-text.md
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,14 +6,14 @@ ms.author: wiassaf
6
6
ms.topic: troubleshooting
7
7
ms.service: azure-synapse-analytics
8
8
ms.subservice: troubleshooting
9
-
ms.date: 12/03/2020
9
+
ms.date: 02/27/2025
10
10
---
11
11
12
12
# Troubleshoot reading UTF-8 text from CSV or Parquet files using serverless SQL pool in Azure Synapse Analytics
13
13
14
14
This article provides troubleshooting steps for reading UTF-8 text from CSV or Parquet files using serverless SQL pool in Azure Synapse Analytics.
15
15
16
-
When UTF-8 text is read from a CSV or PARQUET file using serverless SQL pool, some special characters like ü and ö are incorrectly converted if the query returns VARCHAR columns with non-UTF8 collations. This is a known issue in SQL Server and Azure SQL. Non-UTF8 collation is the default in Synapse SQL so customer queries will be affected. Customers who use standard English characters and some subset of extended Latin characters may not notice the conversion errors. The incorrect conversion is explained in more detail in [Always use UTF-8 collations to read UTF-8 text in serverless SQL pool](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/always-use-utf-8-collations-to-read-utf-8-text-in-serverless-sql/ba-p/1883633)
16
+
When UTF-8 text is read from a CSV or PARQUET file using serverless SQL pool, some special characters like `ü` and `ö` are incorrectly converted if the query returns **varchar** columns with non-UTF8 collations. This is a known issue in SQL Server and Azure SQL. Non-UTF8 collation is the default in Synapse SQL so customer queries will be affected. Customers who use standard English characters and some subset of extended Latin characters might not notice the conversion errors. The incorrect conversion is explained in more detail in [Always use UTF-8 collations to read UTF-8 text in serverless SQL pool](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/always-use-utf-8-collations-to-read-utf-8-text-in-serverless-sql/ba-p/1883633).
17
17
18
18
## Workaround
19
19
@@ -26,7 +26,7 @@ The workaround to this issue is to always use UTF-8 collation when reading UTF-8
26
26
COLLATE Latin1_General_100_BIN2_UTF8;
27
27
```
28
28
29
-
- You can explicitly define collation on VARCHAR column in OPENROWSET or external table:
29
+
- You can explicitly define collation on **varchar** column in OPENROWSET or external table:
30
30
31
31
```sql
32
32
select geo_id, cases =sum(cases)
@@ -37,10 +37,9 @@ The workaround to this issue is to always use UTF-8 collation when reading UTF-8
37
37
group by geo_id
38
38
```
39
39
40
-
- If you did not specify UTF8 collation on external tables that read UTF8 data, you need to re-create impacted external tables and set UTF8 collation on VARCHAR columns (metadata operation).
40
+
- If you didn't specify UTF8 collation on external tables that read UTF8 data, you need to re-create impacted external tables and set UTF8 collation on **varchar** columns (metadata operation).
41
41
42
-
43
-
## Next steps
42
+
## Related content
44
43
45
44
*[Query Parquet files with Synapse SQL](../sql/query-parquet-files.md)
46
45
*[Query CSV files with Synapse SQL](../sql/query-single-csv-file.md)
0 commit comments