Skip to content

Flaky ensure_interruptible_after() tests #40715

@orlitzky

Description

@orlitzky

Example (from CI):

Error: Failed example:: Exception raised:
cysignals.signals.AlarmInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/miniconda/envs/sage-dev/lib/python3.11/site-packages/sage/doctest/forker.py", line 733, in _run
    self.compile_and_execute(example, compiler, test.globs)
  File "/usr/share/miniconda/envs/sage-dev/lib/python3.11/site-packages/sage/doctest/forker.py", line 1157, in compile_and_execute
    exec(compiled, globs)
  File "<doctest sage.matrix.matrix2.Matrix._charpoly_df[19]>", line 1, in <module>
    with ensure_interruptible_after(Integer(1)): m.charpoly()
  File "/usr/share/miniconda/envs/sage-dev/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/share/miniconda/envs/sage-dev/lib/python3.11/site-packages/sage/doctest/util.py", line 904, in ensure_interruptible_after
    raise RuntimeError(
RuntimeError: Function is not interruptible within 1.0000 seconds, only after 1.2124 seconds

I recently "fixed" two instances of this, and a summary of the discussion is that

  1. ensure_interruptible_after() is not a priori reliable because it uses wall time. The CPU scheduler is free to put any task to sleep for any amount of time.
  2. Fixing that is not so trivial; for example just switching it to CPU time is not guaranteed to help unless we have 100% accurate accounting of CPU time in all subprocesses. The old cputime() asks the subprocesses to account for their time, and some of them don't. The newer CPU timer used for --warn-long relies on psutil on non-linux platforms, but psutil is standard only in the sage distro and is not a dependency of the sage library. We would also need to adjust the timing based on how fast the CPU is, or pad the numbers.
  3. In the meantime, we can minimize the CI failures by ensuring that the example we're going to Ctrl-C runs for a VERY long time, and then setting the max_wait_after_interrupt to something huge, like five seconds. The tests are usually there to ensure that sig_on() and sig_off() are wrapping some C code, so if we can interrupt something that would finish in ten minutes after only, say, four seconds... it's fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions