Skip to content

[Bug]: test_pybamm_import introduces global state mutation via sys.modules leading to CI instability #5402

@thegialeo

Description

@thegialeo

PyBaMM Version

latest

Python Version

3.12

Describe the bug

Background

While investigating CI instability in PR #5398, I temporarily skipped:

@pytest.mark.skip(
    reason="Temporarily skipped to check if this test is causing CI flakiness"
)
def test_pybamm_import(self):

After skipping this test, all CI tests passed consistently.

This strongly suggests that test_pybamm_import is introducing side effects that leak into the global Python runtime and affect unrelated tests.

Even attempts to isolate the test using:

@pytest.mark.forked

to run it in a subprocess did not prevent CI instability. This indicates that the test's global mutations may persist in ways that are not fully isolated by subprocess execution, possibly due to cached import-time behavior or shared resources.


Root Cause

The test currently manipulates sys.modules to simulate the absence of optional dependencies:

modules = {}
for module_name, module in sys.modules.items():
    base_module_name = module_name.split(".")[0]
    if base_module_name in present_optional_import_deps:
        modules[module_name] = module
        sys.modules[module_name] = None

It also removes pybamm from the import system:

for module_name in list(sys.modules.keys()):
    base_module_name = module_name.split(".")[0]
    if base_module_name == "pybamm":
        sys.modules.pop(module_name)

This approach mutates global interpreter state, which is shared across the entire pytest session.

Even though the test attempts restoration:

finally:
    for module_name, module in modules.items():
        sys.modules[module_name] = module

this is not guaranteed to be safe or complete. This is especially risky if tests are run in parallel, since other tests may attempt imports of optional dependencies (e.g. jax) before the finally block restores sys.modules.


Why Modifying sys.modules Is Problematic

Setup

conda create -n test python=3.12
pip install numpy
pip install pandas

1. It Breaks Import Invariants

sys.modules is not just a cache. It is part of Python’s import contract.

Setting entries to None introduces invalid states:

import sys
import importlib

import math
sys.modules["math"] = None

importlib.import_module("math")  # Raises ModuleNotFoundError

Even though math exists.


2. It Can Break Transitive Imports

A module imported after mutation may fail in unexpected ways:

import sys

sys.modules["numpy"] = None

import pandas  # May now fail depending on import timing

Failures can vary depending on test order, parallelism, and import timing.


3. Restoration Is Not Reliable

If any module imports an optional dependency during or after the mutation window, Python may cache partial state.

Example:

import sys
import types

# Step 1: simulate a module that optionally imports a fake module
module_name = "some_module_that_optionally_imports_fake"

mock_module = types.ModuleType(module_name)
code = """
try:
    import fake_module  # this module does NOT exist
except ImportError:
    fake_module = "fallback used"

print("Inside module:", fake_module)
"""
exec(code, mock_module.__dict__)
sys.modules[module_name] = mock_module

# Step 2: simulate fake_module being missing (optional, already missing)
sys.modules["fake_module"] = None

# Step 3: import our mock module (will see fake_module as missing, print "Inside module: fallback used")
import some_module_that_optionally_imports_fake

# Step 4: re-importing the same module
# The import-time logic is cached, this line does NOT rerun and will not print "print("Inside module:", fake_module)"
import some_module_that_optionally_imports_fake

Now some_module_that_optionally_imports_fake may have cached failure paths. Import-time logic may already have executed differently, and restoring sys.modules cannot undo this.


4. It Violates Test Isolation

Tests should not mutate interpreter-global import state. The current approach is effectively mid-session environment rewriting.


What the Test Is Trying to Validate

The intention is valid and important:

Importing pybamm should not require optional dependencies.

However, the mechanism used to test this is unsafe.


First ideas on Refactor Directions

I’ve been thinking about how we might preserve the test’s intent while avoiding unsafe mutation of sys.modules, and I’d like to open a discussion on some approaches we could explore using pytest-native tools:

  1. Potentially use pytest’s monkeypatch to patch imports locally

    • One idea is to avoid globally mutating sys.modules and instead temporarily override it only within the test’s scope using monkeypatch. (see https://docs.pytest.org/en/6.2.x/monkeypatch.html)

    • A possible pattern could look like this:

      def test_pybamm_import(monkeypatch):
          # Simulate missing optional dependencies
          monkeypatch.setitem(sys.modules, "optional_dep", None)
      
          # Import pybamm in this isolated context
          import importlib
          import pybamm  # Should not raise even if optional_dep is missing
    • This approach would automatically revert any changes after the test, which might help prevent interference with other tests.

  2. Possibly check importability without touching sys.modules

    • Another avenue could be using importlib.util.find_spec to check whether a module is importable, without actually importing optional dependencies:

      from importlib.util import find_spec
      assert find_spec("pybamm") is not None
    • This could potentially be combined with monkeypatch to simulate optional dependencies being absent.

Points for discussion:

  • How can we avoid global sys.modules mutation while still testing the same behavior?
  • What approaches ensure deterministic, isolated tests regardless of execution order?
  • How can we best verify that pybamm can import without optional dependencies, while minimizing CI flakiness?
  • Are there other strategies I haven’t considered that could achieve these goals more safely or cleanly?

I’m curious to hear everyone’s thoughts. Are there refinements, alternatives, or potential pitfalls I might have missed?

Steps to Reproduce

see above

Relevant log output

see above

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions