You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many of the flakes we see are timing-related. The more straightforward
kludges I've used here are just adding small sleeps between things so
they are more clearly separated in time.
---
`test_endpoint_disable_on_repeated_failure` is a special case.
The test requires that 2 requests fire but not too soon but also not too
far apart.
In practice when this test fails, it's because the 2nd request fires too
late, after the "forgiveness" rule kicks in (if an endpoint fails and we
don't see it _fail again_ within 2x the `disabled_in` duration, then we
don't
disable it).
The reason for the poor timing could be contention on the db/queue from
or just due to the CPU being too busy. I tweaked the timing a little to
try and smooth it over, but setting `RUST_TEST_THREADS=1` seemed to help
the most.
When I run the suite locally with `RUST_TEST_THREADS=1`
set, I regularly see deadlocks, so I've set this in CI, not in
`run-tests.sh` for the time being.
To be fair, I also see deadlocks locally without `RUST_TEST_THREADS=1`
being set, but different ones.
Commonly these deadlocking tests involve multiple calls to "start svix
server" functions, and seem to be mitigated by carefully
dropping/aborting the server join handles one by one, or rewriting such
that you only need one server. A couple of these tests have been
rewritten, but there are going to be more out there.
0 commit comments