fix(datafusion-distributed): relax ttl map timing constraints #218

JSOD11 · 2025-11-11T01:41:20Z

Resolves #126: Fix flaky test

I believe the intention of the test is as follows:

Create a TTL map with ttl: 10ms and tick: 2ms (garbage collector runs every tick, entries expire after 5 ticks)
Spawn tokio tasks that perform various asynchronous insertions and deletions on the map
Wait until the tokio tasks complete
Assert that the map empties itself within 2x the TTL (20ms)

Say the final insertion of an entry into the map by an arbitrary tokio task occurs at time T. Then in an idealized setting we'd expect this entry to be removed by the ttl map garbage collector at time T + TTL, so it seems reasonable to assert that the map should be empty by T + 2 * TTL.

The issue is:

Our TTL here is very small, 10ms
The way we actually perform this check within the test is with a function called assert_eventually (defined just above the test) that performs a check every 10ms
The garbage collector here is running every 2ms independent of insertion time
Crowded CI machines will additionally suffer delays of a few ms

In the image, the blue lines represent the garbage collection operation, which in this configuration will occur every 2ms (size of tick). It's possible for the final entry to be inserted just after a gc cycle, meaning that when assert_eventually reaches T+10ms, the final entry has not expired. The entry is collected just afterwards, but looking at the way assert_eventually is written, it does not actually check the final bound, meaning it is only checking 1 * TTL in this configuration, leading to the flake.

Combining the above with crowded CI machines that may also arbitrarily delay operations by a few ms, this test becomes noticeably flaky.

To resolve this problem, we boost the window to 100ms, which should be more than enough to resolve flakiness and will not meaningfully increase CI time.

gabotechs

Really good explanation! +1

JSOD11 added 2 commits November 10, 2025 20:41

Relax timing constraints.

4ccaebb

Remove unnecessary changes. Add minor fixes.

af16581

JSOD11 force-pushed the jsod/fix-flake-11-10-25 branch from 83c1762 to af16581 Compare November 13, 2025 23:26

JSOD11 changed the title ~~Relax timing constraints~~ fix(datafusion-distributed): relax ttl map timing constraints Nov 13, 2025

JSOD11 marked this pull request as ready for review November 13, 2025 23:41

gabotechs approved these changes Nov 14, 2025

View reviewed changes

gabotechs merged commit dd853a7 into datafusion-contrib:main Nov 14, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(datafusion-distributed): relax ttl map timing constraints #218

fix(datafusion-distributed): relax ttl map timing constraints #218

Uh oh!

JSOD11 commented Nov 11, 2025 •

edited

Loading

Uh oh!

gabotechs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(datafusion-distributed): relax ttl map timing constraints #218

fix(datafusion-distributed): relax ttl map timing constraints #218

Uh oh!

Conversation

JSOD11 commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gabotechs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JSOD11 commented Nov 11, 2025 •

edited

Loading