Skip to content

Conversation

@arnecls
Copy link
Contributor

@arnecls arnecls commented Feb 10, 2026

When a lot of requests for the same token come in at the same time, hitting a cold cache, each request will trigget a cache miss.

In severe cases, this can lead to rate limiting.

This change introduces a FIFO, lock based queue based on the token ID to tackle this problem. The locks may time out based on request context and return a 429 (overloaded, come back later).

@arnecls arnecls requested review from a team and mzjulian as code owners February 10, 2026 11:17
@arnecls arnecls self-assigned this Feb 10, 2026
@github-actions github-actions bot added 🐞 bug Something that is not working as intended 🔀 metadata-server Affects the metadata-server codebase and removed 🐞 bug Something that is not working as intended labels Feb 10, 2026
@arnecls arnecls added 🐞 bug Something that is not working as intended and removed 🔀 metadata-server Affects the metadata-server codebase labels Feb 10, 2026
@github-actions github-actions bot added the 🔀 metadata-server Affects the metadata-server codebase label Feb 10, 2026
@qgoulart qgoulart requested a review from Copilot February 10, 2026 11:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a per-token FIFO ticket lock to prevent parallel cache misses for the same token, reducing thundering-herd behavior and downstream rate limiting.

Changes:

  • Added a TicketLock (FIFO lock with context-aware cancellation) and a uint64 min-heap for tracking canceled tickets.
  • Updated access token handler to serialize same-token cache-miss fetches and return 429 on lock acquisition timeout.
  • Extended token cache with a per-token inflight lock map.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
internal/shared/ticketlock.go Adds FIFO ticket lock with canceled-ticket tracking to avoid parallel cache misses.
internal/shared/intheap.go Adds HeapUint64 to support min-heap operations for canceled tickets.
internal/shared/ticketlock_test.go Adds a basic unit test for ticket allocation, cancellation, and unlock behavior.
cmd/metadata-server/tokencache.go Adds inflight lock map and accessor to retrieve per-token locks.
cmd/metadata-server/tokenhandlers.go Uses per-token lock on cache miss and returns 429 when lock can’t be acquired.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@qgoulart qgoulart requested review from Copilot and qgoulart February 11, 2026 10:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arnecls arnecls requested a review from Copilot February 11, 2026 13:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arnecls arnecls requested a review from Copilot February 11, 2026 13:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

qgoulart
qgoulart previously approved these changes Feb 11, 2026
@github-actions
Copy link
Contributor

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/trivago/identity-metadata-server/cmd/metadata-server 0.00% (ø)
github.com/trivago/identity-metadata-server/internal/shared 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/trivago/identity-metadata-server/cmd/metadata-server/tokencache.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/cmd/metadata-server/tokenhandlers.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/internal/shared/intheap.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/internal/shared/ticketlock.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/trivago/identity-metadata-server/cmd/metadata-server/tokencache_test.go
  • github.com/trivago/identity-metadata-server/internal/shared/ticketlock_test.go

2 similar comments
@github-actions
Copy link
Contributor

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/trivago/identity-metadata-server/cmd/metadata-server 0.00% (ø)
github.com/trivago/identity-metadata-server/internal/shared 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/trivago/identity-metadata-server/cmd/metadata-server/tokencache.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/cmd/metadata-server/tokenhandlers.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/internal/shared/intheap.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/internal/shared/ticketlock.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/trivago/identity-metadata-server/cmd/metadata-server/tokencache_test.go
  • github.com/trivago/identity-metadata-server/internal/shared/ticketlock_test.go

@github-actions
Copy link
Contributor

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/trivago/identity-metadata-server/cmd/metadata-server 0.00% (ø)
github.com/trivago/identity-metadata-server/internal/shared 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/trivago/identity-metadata-server/cmd/metadata-server/tokencache.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/cmd/metadata-server/tokenhandlers.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/internal/shared/intheap.go 0.00% (ø) 0 0 0
github.com/trivago/identity-metadata-server/internal/shared/ticketlock.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/trivago/identity-metadata-server/cmd/metadata-server/tokencache_test.go
  • github.com/trivago/identity-metadata-server/internal/shared/ticketlock_test.go

@arnecls arnecls requested review from Copilot and qgoulart February 11, 2026 16:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arnecls arnecls requested a review from Copilot February 12, 2026 10:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +78 to +79
defer l.ticketGuard.Unlock()
heap.Push(l.canceledTickets, ticket)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancellation can deadlock the lock: there’s a race where activeTicket can become ticket right after the activeTicket == ticket check but before this ctx.Done() branch executes. In that case, returning 0 without advancing activeTicket leaves the ticket ‘active’ with no owner to call Unlock(). Consider handling ctx.Done() by re-checking activeTicket while holding ticketGuard and either (a) treating it as acquired if activeTicket == ticket, or (b) if aborting while activeTicket == ticket, advancing activeTicket (and skipping any consecutive canceled tickets) so other waiters can make progress.

Suggested change
defer l.ticketGuard.Unlock()
heap.Push(l.canceledTickets, ticket)
// Re-check if this ticket has become active while we were waiting
// for the context to be canceled. If so, treat the lock as acquired
// to avoid leaving an active ticket with no owner to call Unlock().
if atomic.LoadUint64(&l.activeTicket) == ticket {
l.ticketGuard.Unlock()
return ticket
}
heap.Push(l.canceledTickets, ticket)
l.ticketGuard.Unlock()

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +108
for {
ticket := atomic.AddUint64(&l.activeTicket, 1)
nextCanceledTicket, hasCanceledTickets := l.canceledTickets.Peek()

switch {
// No canceled tickets, we can return
case !hasCanceledTickets:
return

// The last canceled ticket is the same as the current ticket.
// We need to try again with the next ticket (which might also be
// canceled).
case nextCanceledTicket == ticket:
heap.Pop(l.canceledTickets)

// There are canceled tickets, but the current ticket is smaller than
// the first canceled ticket.
default:
return
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If canceledTickets ever contains an entry less than the current ticket (e.g., due to the cancellation race described above), the default branch returns and the stale smallest canceled ticket will remain at the heap root forever. That prevents future cancellations from being skipped when their ticket number is reached. A robust fix is to pop and discard any canceled tickets < ticket (stale) before comparing for equality, then continue skipping while nextCanceledTicket == ticket.

Suggested change
for {
ticket := atomic.AddUint64(&l.activeTicket, 1)
nextCanceledTicket, hasCanceledTickets := l.canceledTickets.Peek()
switch {
// No canceled tickets, we can return
case !hasCanceledTickets:
return
// The last canceled ticket is the same as the current ticket.
// We need to try again with the next ticket (which might also be
// canceled).
case nextCanceledTicket == ticket:
heap.Pop(l.canceledTickets)
// There are canceled tickets, but the current ticket is smaller than
// the first canceled ticket.
default:
return
outer:
for {
ticket := atomic.AddUint64(&l.activeTicket, 1)
for {
nextCanceledTicket, hasCanceledTickets := l.canceledTickets.Peek()
switch {
// No canceled tickets, we can return
case !hasCanceledTickets:
return
// Discard any stale canceled tickets that are less than the current ticket.
case nextCanceledTicket < ticket:
heap.Pop(l.canceledTickets)
continue
// The next canceled ticket is the same as the current ticket.
// Skip this ticket and try again with the next one (which might also be canceled).
case nextCanceledTicket == ticket:
heap.Pop(l.canceledTickets)
continue outer
// There are canceled tickets, but the current ticket is smaller than
// the first (smallest) canceled ticket.
default:
return
}

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +32
func NewTicketLock(granularity time.Duration) *TicketLock {
return &TicketLock{
nextTicket: 1,
activeTicket: 1,
pauseDuration: granularity,
canceledTickets: &HeapUint64{},
ticketGuard: &sync.Mutex{},
}
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LockWithContext uses time.NewTicker(l.pauseDuration), which panics if pauseDuration <= 0. Since NewTicketLock is a general-purpose constructor, it should defensively validate granularity (e.g., clamp to a minimum positive duration or return a well-defined behavior) to avoid runtime panics.

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +115
for {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Millisecond)
defer cancel()

ticket := lock.LockWithContext(ctx)
time.Sleep(5 * time.Millisecond)
if ticket != 0 {
order = append(order, ticket)
lock.Unlock()
return
}
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer cancel() is inside a potentially long-running loop; defers will accumulate until the goroutine returns, causing unnecessary resource retention. Call cancel() explicitly at the end of each iteration (after LockWithContext returns) instead of deferring it.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something that is not working as intended 🔀 metadata-server Affects the metadata-server codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants