Bug Fix: win32 readFileContents() cannot read file names with unicode characters. (Fix utf8 to utf16)#159
Merged
nalgeon merged 3 commits intonalgeon:mainfrom Feb 6, 2026
Conversation
Owner
|
Thank you for the PR! Compilation on Windows is failing: |
Owner
|
Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On Windows, running sqlite with sqlean extension:
select fileio_read('hello世界.txt');will fail silently, returning an empty string, regardless ifhello世界.txthas contents.Windows uses UTF16 for it's native file system encoding for all file names (versions NT, 2000, XP, 7, 10, 11). Files with unicode characters (eg:
世界inhello世界.txt) would be converted to UTF8 internally to sqlean, when Windows is expecting a UTF16 encoding. This prevents sqlean from reading files on windows (that have unicode characters in the file name). Windows function_wfopenworks with file names with or without unicode characters.