Skip to content

Small discrepancy observed in the future deref timeout period #1184

@ikappaki

Description

@ikappaki

Hi,

there appears to be a small discrepancy with the future deref timeout period on MS-Windows, where it times out a few milliseconds earlier than the actual period is considered.

This is not reproducible locally, but it appears to be happening intermittently on github CI when running the test further below, which is supposed to be testing that the time out waiting for a thread after the 100ms timeout period.

The two most recent CI failures suggest that the timeout occurred around 4 and 1 milliseconds before the 100ms timeout period had fully elapsed:

     timed deref of future :: deref timeout outside of [100ms, 500ms]
     range: 0.09710249999989173secs -- :pre 1348.2037947 :start 1348.2039843 :end 1348.3010868

     timed deref of future :: deref timeout outside of [100ms, 500ms]
     range: 0.09964290000016263secs -- :pre 1311.3282057 :start 1311.3284092 :end 1311.4280521

I can think of the following causes

  1. On MS-Windows, or specific MS-Windows versions like the one running on GitHub runners, there’s no strict guarantee that the timeout will occur exactly after the prescribed period, it could happen slightly earlier.
  2. There might be an issue with perf_counter.
  3. The test itself could be flawed.

I can't spot something wrong with points 2 or 3, so I suspect there’s a discrepancy in the timeout behavior at the Python thread level, particularly on the Windows Servers used by GitHub runners.

A straightforward solution would be to make the test more tolerant, for example: (< 0.095 diff 0.5).

The test:
test_core_fns.lpy

(testing "timed deref of future"
    (let [pre (time/perf-counter)
          fut (future (time/sleep 3))
          start (time/perf-counter)
          ;; block for 100ms
          ret (deref fut 100 :timed-out)
          end (time/perf-counter)
          diff (- end start)]
      (is (= :timed-out ret))
      (is (< 0.1 diff 0.5) (str "deref timeout outside of [100ms, 500ms] range: " diff "secs"
                              " -- :pre " pre " :start " start " :end " end))
      (is (= false (future-cancelled? fut)))
      (is (= false (future-done? fut)))
      ;; can't always cancel a sleep-ed Future
      (future-cancel fut))

Some links from an earlier investigation:

I've also had a quick look at the thread pool executor and it appears the timeout is eventually passed to a threading.conditional, so that should have worked as expected at that level I think (i.e. the conditional shouldn't have returned before the timeout has expired).

Python future conditional wait: https://github.com/python/cpython/blob/035f512046337e64a018d11fdaa3b21758625291/Lib/concurrent/futures/_base.py#L443

Python conditional wait implementation on win: https://github.com/python/cpython/blob/035f512046337e64a018d11fdaa3b21758625291/Python/condvar.h#L171

WaitForSingleObjectEx on win: https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobjectex

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions