BLD: Sphinx doc build crashes on Windows due to missing encoding in apigen.py

The FURY v2 documentation build crashes completely on Windows before generating a single HTML page. This is a hard crash, not a warning — the entire Sphinx process aborts during the `builder-inited` phase with a `UnicodeDecodeError`. Any contributor on Windows who tries to build the docs locally hits this wall immediately and gets zero output. The root cause is four `open()` calls in `docs/source/ext/apigen.py` that do not specify a file encoding, combined with the fact that at least two FURY source files contain non-ASCII UTF-8 bytes that Windows cannot interpret under its default encoding.

## Environment Where This Reproduces

- OS: Windows 11 (10.0.26200)
- Python: 3.14.2 (CPython)
- Sphinx: 8.2.3
- System locale encoding: cp1252 (Windows Western European default)
- Branch: v2

This does not reproduce on Linux or macOS because those systems default to UTF-8 as their locale encoding. It is a Windows-only crash.

## Full Traceback

The Sphinx error log written to `%TEMP%\sphinx-err-*.log` contains the following:

```
File "...\fury\docs\source\ext\apigen.py", line 234, in _parse_module_with_import
    with open(mod.__file__) as fi:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 14569: character maps to <undefined>
```

Sphinx wraps this as:

```
sphinx.errors.ExtensionError: 'charmap' codec can't decode byte 0x8f in position 14569: character maps to <undefined>
```

The build aborts at this point. No RST files are generated, no HTML is produced, and the build log contains zero warnings because the process never reaches the warning-emission stage.

## Root Cause — Detailed Explanation

Python 3's `open()` function, when called without an `encoding=` argument, uses `locale.getpreferredencoding(False)` to determine the encoding. On Windows with a standard installation this returns `cp1252` (Windows-1252, Western European). This encoding only covers 256 code points and cannot represent many UTF-8 byte sequences.

`apigen.py` is a Sphinx extension that crawls the FURY package, imports each module, opens the corresponding `.py` source file to parse its docstrings, and then writes the generated API reference RST files. There are four `open()` calls in this file, none of which specify `encoding=`:

**Line 207** — reads a Python source file by filesystem path:

    f = open(filename)

**Line 233** — reads a Python source file via the module's `__file__` attribute:

    with open(mod.__file__) as fi:

**Line 479** — writes a generated RST output file:

    fileobj = open(outfile, "w")

**Line 539** — writes another generated RST index file:

    idx = open(path, "w")

When `apigen.py` reaches `fury/actor/planar.py` during its module crawl, it opens the file using `cp1252`. The file contains the byte `0x8F` at position 14569. In `cp1252`, byte `0x8F` is undefined — it is not mapped to any character. Python raises `UnicodeDecodeError` and the entire Sphinx build crashes.

`fury/actor/odf_slicer.py` is a second confirmed triggering file. There may be others.

## Why This Byte Exists

UTF-8 uses multi-byte sequences for non-ASCII characters. The byte `0x8F` is a valid UTF-8 continuation byte and appears as part of a multi-byte UTF-8 sequence encoding a non-ASCII character — likely a special symbol, a non-breaking space, or a character in a contributor name inside a docstring or comment. It is perfectly valid UTF-8 and perfectly normal in Python source files, which are required by PEP 3120 to be UTF-8.

## Proof That the Fix Works

Running the same file-open logic with `encoding='utf-8'` across the entire FURY package produces zero UnicodeDecodeError instances. Every file in `fury/` opens and reads cleanly when UTF-8 is explicitly specified.

## The Fix

Add `encoding='utf-8'` to all four `open()` calls in `apigen.py`:

- Line 207: `f = open(filename)` becomes `f = open(filename, encoding='utf-8')`
- Line 233: `with open(mod.__file__) as fi:` becomes `with open(mod.__file__, encoding='utf-8') as fi:`
- Line 479: `fileobj = open(outfile, "w")` becomes `fileobj = open(outfile, "w", encoding='utf-8')`
- Line 539: `idx = open(path, "w")` becomes `idx = open(path, "w", encoding='utf-8')`

This is the correct and complete fix because PEP 3120 mandates that Python source files are UTF-8, and Sphinx itself expects UTF-8 for all RST output it processes. There is no situation where a different encoding would be more correct here.

I will submit a PR targeting the v2 branch with this fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLD: Sphinx doc build crashes on Windows due to missing encoding in apigen.py #1139

Environment Where This Reproduces

Full Traceback

Root Cause — Detailed Explanation

Why This Byte Exists

Proof That the Fix Works

The Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BLD: Sphinx doc build crashes on Windows due to missing encoding in apigen.py #1139

Description

Environment Where This Reproduces

Full Traceback

Root Cause — Detailed Explanation

Why This Byte Exists

Proof That the Fix Works

The Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions