Skip to content

Conversation

@ruchirK
Copy link
Contributor

@ruchirK ruchirK commented Nov 17, 2025

Fixes #221. Previously, we encountered a bug at the initial gc tick on ttl_map where time (denoting the number of ticks since map initialization) starts at 0, but the logic to compute the time to remove the inserted value is

free_time = (time - 1) // (number of ticks to ttl)

In other words, we maintain a circular buffer of inserts, with a single slot per tick, and we will delete the item once buffer.len() ticks have elapsed (ttl = tick * buffer.len()).

This is all fine, except we implement the logic as:

free_time = time.wrapping_sub(1) / buffer.len()

which is computing for u64 time:

free_time = ((time - 1) mod 2^64) mod buffer.len()

when we really just want free_time = (time - 1) mod buffer.len().

Luckily this equality holds as long as 0 < time < 2^64, as for those times time - 1 mod 2&64 = time. This commit changes our behavior to initialize time at 1 instead of 0. We don't need to worry about the overflow case because even if the tick duration was a nanosecond, it would take ~584 years to overflow 64 bits at which point we would surely have other problems besides the momentarily incorrect ttl. For the underflow case, this should primarily help with unit tests above anything else, as the bug only happened at time = 0.

Copy link
Collaborator

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think adding a 1 sentence comment to sum up the PR description would be good here.

@ruchirK ruchirK force-pushed the fix-time-wrapping-sub branch from f6d9b25 to 09f9cd2 Compare November 17, 2025 19:44
Comment on lines -187 to +188
time: Arc::new(AtomicU64::new(0)),
// Explicitly initialize `time` to 1 to avoid underflow issues with circular buffer.
time: Arc::new(AtomicU64::new(1)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice. Any chance to reproduce this in a test that fails in main and succeeds in this branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a good q let me think on that for a minute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! PTAL

@ruchirK ruchirK force-pushed the fix-time-wrapping-sub branch from 09f9cd2 to 063c1b4 Compare November 19, 2025 00:18
Copy link
Collaborator

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! this is just missing to solve some clippy isues, but LGTM

Fixes datafusion-contrib#221. Previously, we encountered a bug at the initial gc tick on
ttl_map where `time` (denoting the number of ticks since map initialization)
starts at 0, but the logic to compute the time to remove the inserted value is

```
free_time = (time - 1) // (number of ticks to ttl)
```

In other words, we maintain a circular buffer of inserts, with a single slot per tick,
and we will delete the item once `buffer.len()` ticks have elapsed (ttl = tick * buffer.len()).

This is all fine, except we implement the logic as:

```
free_time = time.wrapping_sub(1) / buffer.len()
```

which is computing for `u64` `time`:

```
free_time = ((time - 1) mod 2^64) mod buffer.len()
```

when we really just want `free_time = (time - 1) mod buffer.len()`.

Luckily this equality holds as long as ` 0 < time < 2^64`, as for those times `time - 1 mod 2&64 = time`.
This commit changes our behavior to initialize `time` at 1 instead of 0. We don't need to worry about
the overflow case because even if the tick duration was a nanosecond, it would take ~584 years to overflow
64 bits at which point we would surely have other problems besides the momentarily incorrect ttl. For
the underflow case, this should primarily help with unit tests above anything else, as the bug
only happened at `time = 0`.
@ruchirK ruchirK force-pushed the fix-time-wrapping-sub branch from 063c1b4 to 5e5f7c0 Compare November 19, 2025 16:06
@gabotechs gabotechs merged commit 28a278c into datafusion-contrib:main Nov 19, 2025
4 checks passed
// Advance GC 3 times, which shouldn't free the first key.
for _ in 0..3 {
TTLMap::<String, i32>::gc(ttl_map.time.clone(), &ttl_map.buckets);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can just truncate everything after this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix TTL map wrapping sub

3 participants