-
Notifications
You must be signed in to change notification settings - Fork 751
fix(auth): debounce getToken() function #6282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
hayemaxi
approved these changes
Dec 30, 2024
Problem: The Identity team noticed a large spike in token refreshes for specific users. One user would trigger refresh over 50 times within a few seconds. Solution: The telemetry showed that `getChatAuthState()` was being called many times in a short period. This eventually triggered the token refresh logic many times, if the token was expired. The solution is to add a debounce to `getToken()` which calls the refresh logic. - `debounce()` only accepts functions without any args, the refresh logic requires args - `getToken()` will also load from disk is the token is not expired, so debouncing here saves disk I/O as well. The current debounce interval is 100 milliseconds, which based on telemetry should be enough to capture the barrage of calls. Signed-off-by: nkomonen-amazon <[email protected]>
Problem: By default a sinon fake clock was installed on all tests. This caused the new debounce functionality on getToken to freeze since the clock was not progressed, multiple tests were failing. Solution: Only use the fake clock in tests that need it Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
ce410ea to
6bcf269
Compare
- Removed unnecessary fake clock. I guess this existed for historical reasons but is not needed anymore - Refactored one of the methods so that it could be stubbed Signed-off-by: nkomonen-amazon <[email protected]>
Contributor
Author
|
/retryBuilds |
We can now spy on the underlying getToken method to verify it is being debounced as expected. Also reduce the debounce interval to 50ms to reduce the delay but still catch a barrage of calls Signed-off-by: nkomonen-amazon <[email protected]>
Contributor
Author
|
/retryBuilds |
karanA-aws
pushed a commit
to karanA-aws/aws-toolkit-vscode
that referenced
this pull request
Jan 17, 2025
## Problem: The Identity team noticed a large spike in token refreshes for specific users. One user would trigger refresh over 50 times within a few seconds. Ticket: `P180886632` ## Solution: The telemetry showed that `getChatAuthState()` was being called many times in a short period. This eventually triggered the token refresh logic many times, if the token was expired. The solution is to add a debounce to `getToken()` which calls the refresh logic. - `debounce()` only accepts functions without any args, the refresh logic requires args - `getToken()` will also load from disk is the token is not expired, so debouncing here saves disk I/O as well. The current debounce interval is 100 milliseconds, which based on telemetry should be enough to capture the barrage of calls. With some manual testing it does not feel like UX is impacted in any noticeable way. --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]>
kevluu-aws
pushed a commit
to kevluu-aws/aws-toolkit-vscode
that referenced
this pull request
Jan 23, 2025
## Problem: The Identity team noticed a large spike in token refreshes for specific users. One user would trigger refresh over 50 times within a few seconds. Ticket: `P180886632` ## Solution: The telemetry showed that `getChatAuthState()` was being called many times in a short period. This eventually triggered the token refresh logic many times, if the token was expired. The solution is to add a debounce to `getToken()` which calls the refresh logic. - `debounce()` only accepts functions without any args, the refresh logic requires args - `getToken()` will also load from disk is the token is not expired, so debouncing here saves disk I/O as well. The current debounce interval is 100 milliseconds, which based on telemetry should be enough to capture the barrage of calls. With some manual testing it does not feel like UX is impacted in any noticeable way. --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]>
nkomonen-amazon
added a commit
that referenced
this pull request
Feb 7, 2025
## Problem: getChatAuthState() is called in many places by the Q features simultaneously, this eventually triggers multiple calls to getToken() and if needed refreshToken(). This resulted in refreshToken being spammed and the Identity team seeing spikes in token refreshes from clients. ## Solution: Throttle getChatAuthState(). Throttling w/ leading: true, allows us to instantly return a fresh result OR a cached result in the case we are throttled. Debounce on the other hand would cause callers to hang since they have to wait for debounce to timeout. Also, we put a debounce on getToken() before in #6282 but this did not work since a new SsoAccessToken instance is created each time the offending code flow triggered (we could look to cache the instance instead which would enable the getToken() debounce to be useful. ### Testing To test the difference after adding the throttle: - Add log statements to `getToken()` - Set an expired date in the SSO cache for both token expiration + client registration expiration - Use chat What would happen is that without throttle it would trigger getChatAuthState() many times, likely due to the connection becoming invalid and sending an event to all Q features, causing each of them to call getChatAuthState() at the same time. But when the throttle was added, the amount of these calls dropped to at most 2. Signed-off-by: nkomonen-amazon <[email protected]> --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]>
s7ab059789
pushed a commit
to s7ab059789/aws-toolkit-vscode
that referenced
this pull request
Feb 19, 2025
## Problem: The Identity team noticed a large spike in token refreshes for specific users. One user would trigger refresh over 50 times within a few seconds. Ticket: `P180886632` ## Solution: The telemetry showed that `getChatAuthState()` was being called many times in a short period. This eventually triggered the token refresh logic many times, if the token was expired. The solution is to add a debounce to `getToken()` which calls the refresh logic. - `debounce()` only accepts functions without any args, the refresh logic requires args - `getToken()` will also load from disk is the token is not expired, so debouncing here saves disk I/O as well. The current debounce interval is 100 milliseconds, which based on telemetry should be enough to capture the barrage of calls. With some manual testing it does not feel like UX is impacted in any noticeable way. --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]>
s7ab059789
pushed a commit
to s7ab059789/aws-toolkit-vscode
that referenced
this pull request
Feb 19, 2025
## Problem: getChatAuthState() is called in many places by the Q features simultaneously, this eventually triggers multiple calls to getToken() and if needed refreshToken(). This resulted in refreshToken being spammed and the Identity team seeing spikes in token refreshes from clients. ## Solution: Throttle getChatAuthState(). Throttling w/ leading: true, allows us to instantly return a fresh result OR a cached result in the case we are throttled. Debounce on the other hand would cause callers to hang since they have to wait for debounce to timeout. Also, we put a debounce on getToken() before in aws#6282 but this did not work since a new SsoAccessToken instance is created each time the offending code flow triggered (we could look to cache the instance instead which would enable the getToken() debounce to be useful. ### Testing To test the difference after adding the throttle: - Add log statements to `getToken()` - Set an expired date in the SSO cache for both token expiration + client registration expiration - Use chat What would happen is that without throttle it would trigger getChatAuthState() many times, likely due to the connection becoming invalid and sending an event to all Q features, causing each of them to call getChatAuthState() at the same time. But when the throttle was added, the amount of these calls dropped to at most 2. Signed-off-by: nkomonen-amazon <[email protected]> --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem:
The Identity team noticed a large spike in token refreshes for specific users. One user would trigger refresh over 50 times within a few seconds.
Ticket:
P180886632Solution:
The telemetry showed that
getChatAuthState()was being called many times in a short period. This eventually triggered the token refresh logic many times, if the token was expired.The solution is to add a debounce to
getToken()which calls the refresh logic.debounce()only accepts functions without any args, the refresh logic requires argsgetToken()will also load from disk is the token is not expired, so debouncing here saves disk I/O as well.The current debounce interval is 100 milliseconds, which based on telemetry should be enough to capture the barrage of calls. With some manual testing it does not feel like UX is impacted in any noticeable way.
feature/xbranches will not be squash-merged at release time.