Skip to content

Conversation

@LiGaCu
Copy link
Contributor

@LiGaCu LiGaCu commented Jun 24, 2025

Problem

We need to prevent the thundering herd problem when there is a network issue or backend issue causing a large amount of WebSocket clients disconnected.

Solution

Introduce randomness into the timing of retries to avoid simultaneous requests to re-establish WebSocket connection.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@LiGaCu LiGaCu requested a review from a team as a code owner June 24, 2025 21:28
this.reconnectAttempts++
const delay = Math.min(1000 * Math.pow(2, this.reconnectAttempts), 30000)
const baseDelay = Math.min(1000 * Math.pow(2, this.reconnectAttempts), 30000)
const jitter = Math.random() * 5000 // jitter of 0 ~ 5000 milliseconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious where does this 5000 come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an initially chosen value to distribute the potential surge traffic across 5 seconds.

We can tune this value in the future if needed.

Copy link

@ramana-keerthi ramana-keerthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably less of a concern now, but we might want to add a similar jitter when we open the connection if we start to see high spikes during the start of the business day

@LiGaCu
Copy link
Contributor Author

LiGaCu commented Jun 25, 2025

It's probably less of a concern now, but we might want to add a similar jitter when we open the connection if we start to see high spikes during the start of the business day

That wouldn't help the traffic contributing from different users, since the jitter method would only help distribute the traffic across the jitter interval (i.e. it only helps actions that happen simultaneously at second-level but not for the traffic that might be relatively and consistently high for minutes or hours).

But I agree something like that would help the surge traffic from the same user / machine, when opening the IDE that automatically brings up multiple previous windows. It would be something affecting all AmazonQ functionalities, and there is already a recent PR to introduce a Jitter when selecting profile (and our WorkspaceContextServer will wait until profile is selected):

@LiGaCu LiGaCu merged commit 0542858 into aws:main Jun 25, 2025
6 checks passed
laileni-aws pushed a commit to laileni-aws/language-servers that referenced this pull request Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants