Skip to content

Conversation

@c8ef
Copy link
Contributor

@c8ef c8ef commented Dec 30, 2024

This will prevent the error on systems with a default encoding other than utf-8.

UnicodeDecodeError: 'gbk' codec can't decode byte 0xb6 in position 12958: illegal multibyte sequence

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Dec 30, 2024
@c8ef c8ef requested review from hokein and kadircet December 30, 2024 14:30
@llvmbot
Copy link
Member

llvmbot commented Dec 30, 2024

@llvm/pr-subscribers-clang

Author: None (c8ef)

Changes

This will prevent the error on systems with a default encoding other than utf-8.

UnicodeDecodeError: 'gbk' codec can't decode byte 0xb6 in position 12958: illegal multibyte sequence

Full diff: https://github.com/llvm/llvm-project/pull/121341.diff

1 Files Affected:

  • (modified) clang/tools/include-mapping/cppreference_parser.py (+2-2)
diff --git a/clang/tools/include-mapping/cppreference_parser.py b/clang/tools/include-mapping/cppreference_parser.py
index 9101f3dbff0f94..f7da2ba8bb6d84 100644
--- a/clang/tools/include-mapping/cppreference_parser.py
+++ b/clang/tools/include-mapping/cppreference_parser.py
@@ -139,7 +139,7 @@ def _ParseIndexPage(index_page_html):
 
 
 def _ReadSymbolPage(path, name, qual_name):
-    with open(path) as f:
+    with open(path, encoding="utf-8") as f:
         return _ParseSymbolPage(f.read(), name, qual_name)
 
 
@@ -156,7 +156,7 @@ def _GetSymbols(pool, root_dir, index_page_name, namespace, variants_to_accept):
     #      contains the defined header.
     #   2. Parse the symbol page to get the defined header.
     index_page_path = os.path.join(root_dir, index_page_name)
-    with open(index_page_path, "r") as f:
+    with open(index_page_path, "r", encoding="utf-8") as f:
         # Read each symbol page in parallel.
         results = []  # (symbol_name, promise of [header...])
         for symbol_name, symbol_page_path, variant in _ParseIndexPage(f.read()):

@c8ef c8ef merged commit f385542 into llvm:main Dec 31, 2024
10 checks passed
@c8ef c8ef deleted the include branch December 31, 2024 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants