-
Notifications
You must be signed in to change notification settings - Fork 240
Description
The FURY v2 documentation build crashes completely on Windows before generating a single HTML page. This is a hard crash, not a warning — the entire Sphinx process aborts during the builder-inited phase with a UnicodeDecodeError. Any contributor on Windows who tries to build the docs locally hits this wall immediately and gets zero output. The root cause is four open() calls in docs/source/ext/apigen.py that do not specify a file encoding, combined with the fact that at least two FURY source files contain non-ASCII UTF-8 bytes that Windows cannot interpret under its default encoding.
Environment Where This Reproduces
- OS: Windows 11 (10.0.26200)
- Python: 3.14.2 (CPython)
- Sphinx: 8.2.3
- System locale encoding: cp1252 (Windows Western European default)
- Branch: v2
This does not reproduce on Linux or macOS because those systems default to UTF-8 as their locale encoding. It is a Windows-only crash.
Full Traceback
The Sphinx error log written to %TEMP%\sphinx-err-*.log contains the following:
File "...\fury\docs\source\ext\apigen.py", line 234, in _parse_module_with_import
with open(mod.__file__) as fi:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 14569: character maps to <undefined>
Sphinx wraps this as:
sphinx.errors.ExtensionError: 'charmap' codec can't decode byte 0x8f in position 14569: character maps to <undefined>
The build aborts at this point. No RST files are generated, no HTML is produced, and the build log contains zero warnings because the process never reaches the warning-emission stage.
Root Cause — Detailed Explanation
Python 3's open() function, when called without an encoding= argument, uses locale.getpreferredencoding(False) to determine the encoding. On Windows with a standard installation this returns cp1252 (Windows-1252, Western European). This encoding only covers 256 code points and cannot represent many UTF-8 byte sequences.
apigen.py is a Sphinx extension that crawls the FURY package, imports each module, opens the corresponding .py source file to parse its docstrings, and then writes the generated API reference RST files. There are four open() calls in this file, none of which specify encoding=:
Line 207 — reads a Python source file by filesystem path:
f = open(filename)
Line 233 — reads a Python source file via the module's __file__ attribute:
with open(mod.__file__) as fi:
Line 479 — writes a generated RST output file:
fileobj = open(outfile, "w")
Line 539 — writes another generated RST index file:
idx = open(path, "w")
When apigen.py reaches fury/actor/planar.py during its module crawl, it opens the file using cp1252. The file contains the byte 0x8F at position 14569. In cp1252, byte 0x8F is undefined — it is not mapped to any character. Python raises UnicodeDecodeError and the entire Sphinx build crashes.
fury/actor/odf_slicer.py is a second confirmed triggering file. There may be others.
Why This Byte Exists
UTF-8 uses multi-byte sequences for non-ASCII characters. The byte 0x8F is a valid UTF-8 continuation byte and appears as part of a multi-byte UTF-8 sequence encoding a non-ASCII character — likely a special symbol, a non-breaking space, or a character in a contributor name inside a docstring or comment. It is perfectly valid UTF-8 and perfectly normal in Python source files, which are required by PEP 3120 to be UTF-8.
Proof That the Fix Works
Running the same file-open logic with encoding='utf-8' across the entire FURY package produces zero UnicodeDecodeError instances. Every file in fury/ opens and reads cleanly when UTF-8 is explicitly specified.
The Fix
Add encoding='utf-8' to all four open() calls in apigen.py:
- Line 207:
f = open(filename)becomesf = open(filename, encoding='utf-8') - Line 233:
with open(mod.__file__) as fi:becomeswith open(mod.__file__, encoding='utf-8') as fi: - Line 479:
fileobj = open(outfile, "w")becomesfileobj = open(outfile, "w", encoding='utf-8') - Line 539:
idx = open(path, "w")becomesidx = open(path, "w", encoding='utf-8')
This is the correct and complete fix because PEP 3120 mandates that Python source files are UTF-8, and Sphinx itself expects UTF-8 for all RST output it processes. There is no situation where a different encoding would be more correct here.
I will submit a PR targeting the v2 branch with this fix.