Skip to content

fix: clock sequence race and add stress test;#219

Open
atlet99 wants to merge 14 commits intogofrs:masterfrom
atlet99:fix/race_conditions
Open

fix: clock sequence race and add stress test;#219
atlet99 wants to merge 14 commits intogofrs:masterfrom
atlet99:fix/race_conditions

Conversation

@atlet99
Copy link
Copy Markdown
Contributor

@atlet99 atlet99 commented Jul 4, 2025

Issue - #216

Signed-off-by: Abdurakhman R. <joha.shadibekov@gmail.com>
@kohenkatz kohenkatz linked an issue Jul 4, 2025 that may be closed by this pull request
3 tasks
@codecov
Copy link
Copy Markdown

codecov bot commented Jul 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (875e708) to head (a8e1333).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #219   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines          457       467   +10     
=========================================
+ Hits           457       467   +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cameracker cameracker requested a review from Copilot September 25, 2025 00:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a race condition in Version-1 UUID generation's clock sequence handling and adds comprehensive stress testing to verify the fix. The issue occurred when the 14-bit clock sequence counter overflowed without proper handling, potentially causing UUID collisions.

  • Implements proper 14-bit clock sequence wrapping with bitmask operation
  • Adds mandatory timestamp advancement when sequence wraps to zero
  • Introduces stress test with table-driven concurrent scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
generator.go Fixed clock sequence race condition by adding proper 14-bit wrapping and timestamp advancement
race_v1_test.go Added comprehensive concurrent stress test to verify UUID uniqueness under high contention

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@cameracker
Copy link
Copy Markdown
Collaborator

Hi, I'd still like to see about getting this merged in :) I dont want to obligate you to any contribution you don't have time for - would it be ok if I picked this up and finished it?

cameracker and others added 2 commits March 8, 2026 12:33
…ected high CPU usage

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
break
}
// Sleep briefly to avoid busy-waiting and reduce CPU usage.
time.Sleep(time.Microsecond)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worst case this has a repeating wait of 10 microseconds.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some references that say that the minimum OS timer resolution on Linux can be 50+ microseconds and as much as 15ms on WIndows. That could make this loop VERY slow. 😬

This fixes the clock sequence overflow issue, but what does it do to the benchmarks for V1, V6, and V7 values? Since we want to wait a tiny amount of time this seems like a good place for runtime.Gosched() (so we only yield the current time slice on the scheduler).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dylan-bourque I'm not sure this solution will give us much of a gain. This loop with time.Sleep(time.Microsecond) is called only when the clockSequence wraps (once every 16,384 UUIDs with the same timestamp), not on every NewV1/NewV6/NewV7.

I compared Sleep vs. runtime.Gosched() locally on the BenchmarkGenerator/NewV1|NewV6|NewV7 benchmarks (3 runs each, Apple M3 Pro).

Here are the benchmarks:

NewV1: Sleep ~44-45 ns/op, Gosched ~45 ns/op
NewV6: Sleep ~119 ns/op, Gosched ~119-120 ns/op
NewV7: Sleep ~145 ns/op, Gosched ~145 ns/op

cameracker
cameracker previously approved these changes Mar 8, 2026
@cameracker
Copy link
Copy Markdown
Collaborator

Hi @atlet99 thank you very much for contributing this change. I plan to get this merged as soon as I can get another reviewer to look.

I took the liberty to change the proposed logic to use a microsecond sleep instead of relying on GoSched(). Documentation suggested this was the safer approach.

@atlet99
Copy link
Copy Markdown
Contributor Author

atlet99 commented Mar 9, 2026

@cameracker I'm glad my changes are helping. Unfortunately, I was busy for a while and couldn't respond to your message. In the future, I'll be happy to provide my assistance if required.

generator.go Outdated
// If the sequence wrapped (back to zero) we MUST wait for the
// timestamp to advance to preserve uniqueness (see RFC-9562 §6.1).
if g.clockSequence == 0 {
for {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop code be tidied up a bit, but the logic looks correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What change do you propose? @dylan-bourque

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of something like this

for ; timeNow <= g.lastTime; timeNow = now() {
    time.Sleep(time.Microsecond) // or runtime.Gosched(), see my comment below
}

where now() is a locally defined helper that wraps the if/else timestamp logic

now := func() uint64 {
    epoch := g.epochFunc()
    if useUnixTSMs {
        return uint64(epoch.UnixMilli())
    }
    return g.getEpoch(epoch)
}

I'm not 100% convinced I like that better, though.

@dylan-bourque
Copy link
Copy Markdown
Member

dylan-bourque commented Mar 9, 2026 via email

kohenkatz
kohenkatz previously approved these changes Mar 10, 2026
Copy link
Copy Markdown
Contributor

@kohenkatz kohenkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but wait for @dylan-bourque too since he said he's going to look at it.

@dylan-bourque
Copy link
Copy Markdown
Member

I've been playing with this locally to answer my own question about benchmarks and I'm still able to generate duplicate V1 values with this change at 2000 goroutines X 1000 values. 🤔

FWIW, I don't actually see much difference in the benchmarks (I had to write a new one), but the fix is not complete.

@atlet99
Copy link
Copy Markdown
Contributor Author

atlet99 commented Mar 14, 2026

I've been playing with this locally to answer my own question about benchmarks and I'm still able to generate duplicate V1 values with this change at 2000 goroutines X 1000 values. 🤔

FWIW, I don't actually see much difference in the benchmarks (I had to write a new one), but the fix is not complete.

I think I also need to conduct some tests and find out again what can come out of this;

@atlet99 atlet99 dismissed stale reviews from kohenkatz and cameracker via ed3abe0 March 14, 2026 16:58
@atlet99
Copy link
Copy Markdown
Contributor Author

atlet99 commented Mar 14, 2026

@dylan-bourque u were right about your doubts, I did make mistakes.
What I fixed in last changes:

  • completed the V1 duplicate/race fix in getClockSequence: the remaining issue was stale atTime values (captured before mutex lock, effectively moving backward under heavy contention);
  • added a local now() helper and switched the wait logic to for ; timeNow <= g.lastTime; timeNow = now();
  • Added backward-time clamping: if timeNow < g.lastTime, we force timeNow = g.lastTime to prevent reusing old timestamp + clock_seq pairs after 14-bit sequence wrap.

So, guys, im providing the changes for analysis and verification, IMHO, stress test with 2000 x 1000 for V1 also passed completely. FYI, @cameracker @dylan-bourque @kohenkatz

@atlet99
Copy link
Copy Markdown
Contributor Author

atlet99 commented Mar 14, 2026

UPD: some changes for testing

Added follow-up fixes and test coverage to close the remaining V1 duplication gap under heavy concurrency.

  • Updated getClockSequence to handle stale timestamps safely:
    • clamp backward timeNow to lastTime;
    • use a local now() helper in the wrap wait loop;
    • keep wrap handling RFC-consistent;
  • Added deterministic regression coverage:
    • AtSpecificTimeClockSequenceWrap
    • TestGetClockSequence/WrapUsesFreshEpoch
    • TestGetClockSequence/WrapUsesFreshUnixTSMs
  • Added forced-wrap benchmarks:
    • BenchmarkGenerator/ClockSequenceWrapUTC
    • BenchmarkGenerator/ClockSequenceWrapUnixTSMs
  • Added opt-in stress test (to avoid slowing CI):
    • TestV1UniqueConcurrentStress (runs only with UUID_STRESS_V1=1)
    • scenario: 2000 goroutines x 1000 UUIDs

P.S. if u wanna run test UUID_STRESS_V1=1 go test -run '^TestV1UniqueConcurrentStress$' ./...

}
mu.Lock()
if _, exists := seen[u]; exists {
dupCount++
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should make this atomic.AddUint32() as well, if only for consistency and to avoid someone coming along later and asking why it isn't.

var (
wg sync.WaitGroup
mu sync.Mutex
seen = make(map[UUID]struct{}, goroutines*uuidsPerGor)
Copy link
Copy Markdown
Member

@dylan-bourque dylan-bourque Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in my testing, I made this map[UUID]int32 so that the failure output could report the actual duplicate values and how many of each were generated.

mu.Lock()
if cnt, exists := seen[u]; exists {
    seen[u] = cnt + 1
} else {
    seen[u] = 1
}
mu.Unlock()

and

for v, n := range seen {
    if n > 1 {
        t.Errorf("duplicate V1 UUID: %s appeared %d time(s)", v, n)
    }
}

definitely not necessary, but it makes the failure output more useful

// Calls can arrive with stale atTime values (captured before acquiring the
// lock). Clamp backwards timestamps to the latest emitted one to avoid
// reusing older timestamp + clock-sequence pairs after sequence wrap.
if timeNow < g.lastTime {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could move this inside of the if timeNow <= g.lastTime block below since that check will always also be true. eliminating the branch might also make it slightly faster.

if timeNow <= g.lastTime {
    // Calls can arrive with stale atTime values (captured before acquiring the
    // lock). Clamp backwards timestamps to the latest emitted one to avoid
    // reusing older timestamp + clock-sequence pairs after sequence wrap.
    timeNow = g.lastTime
    ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential race condition in clock sequence generation

5 participants