Skip to content

Conversation

@Hweinstock
Copy link
Contributor

@Hweinstock Hweinstock commented Feb 10, 2025

Problem

Follow up to: #6499

  • The debounce efforts were not aggressive enough.
  • The debugging efforts determined this is due to auth.listConnections being spammed, but unfortunately this function is called in many places, making it difficult to track. Therefore, more investigative work is required.

Solution

  • Move debounce up to getCredentialProviderNames and convert it to throttle (described below). Note that we can't move it up to auth.listConnections because it is not a "pure" function and has expected side effects. Moving this up allows us to throttle both emit calls to this metric more easily.
  • Put the function_call debugging telemetry inside the throttle to make it less noisy.
  • Add more withTelemetryHelper decorators and manual wrapping.
  • Increase aggressiveness in rate-limiting since aws_loadCredentials is still being emitted a lot.

Throttle Notes

  • throttle is similar to debounce, but it returns the last cached result immediately on call. See the tests and docstring for more information.
  • We want throttle over debounce because we require the function to resolve a result immediately. Delaying these results could degrade the UX when authenticating or using the toolkit as other pieces may be waiting on it.

Alternative Solution

  • rather than throttle, continue using debounce, and make the delay more aggressive.
    • pro: keeps telemetry code as the only code affected. throttle does affect the behavior of the program.
    • con: will continue to spam function_call metrics. Which is now much noisier than aws_loadCredentials (but is temporary).

Future Work

  • Investigate more ergonomic way to wrap functions in function_call metric. Non-class methods can't use decorator, but perhaps we can write something else.
  • monitor telemetry following release to continue investigation.
  • found some dead code in the auth folder that could potentially be removed.

  • Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
  • Your code changes must meet the guidelines in CONTRIBUTING.md.
  • License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Hweinstock Hweinstock marked this pull request as ready for review February 10, 2025 18:46
@Hweinstock Hweinstock requested review from a team as code owners February 10, 2025 18:46
}

void this.emitWithDebounce({
telemetry.aws_loadCredentials.emit({
Copy link
Contributor

@jpinkney-aws jpinkney-aws Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does removing the debounce here mean we think this piece is no longer the issue? the debounce is moved higher up the stack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, and now behaves slightly differently.

@jpinkney-aws
Copy link
Contributor

TBH i'm not too concerned about having to put debugging telemetry other places, I'm more scared of events like:

con: will continue to spam function_call metrics. Which is now much noisier than aws_loadCredentials (but is temporary).

considering how many times aws_loadCredentials was called

@Hweinstock
Copy link
Contributor Author

Hweinstock commented Feb 10, 2025

Yeah, I agree. My thinking is that first priority is to stop spamming telemetry right now. Then, add utility for debugging the root issue. I don't think its worth the tradeoff to put more burden on telemetry to potentially debug this faster.

*
* Multiple calls made during the throttle window will return the last returned result.
*/
export function throttle<Input extends any[], Output>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also want to indicate that function calls will be throttled regardless of difference in input arguments

clock.uninstall()
})

it('prevents a function from executing more than once in the `delay` window', async function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a test that asserts the first function call is executed and resolved before the first interval even completes. I.e if we tick 1ms while the delay is 3ms, we will get a result

@justinmk3
Copy link
Contributor

This is a lot of change, in a somewhat high-risk area,

  • Move debounce up to getCredentialProviderName

That's the main change, right? And the other changes in this PR could be dropped.

@Hweinstock
Copy link
Contributor Author

There is not enough evidence to suggest that auth.listConnections being spammed indicates a bug. Therefore, we can target the source of the issue by simply debouncing aggressively and removing debugging metrics to limit load on telemetry. #6548

@Hweinstock Hweinstock closed this Feb 11, 2025
Hweinstock added a commit that referenced this pull request Feb 11, 2025
… remove debugging. (#6548)

## Problem
Alternative solution to
#6541

## Solution
- debounce very aggressively (once per day).
- remove debugging `function_call`. 

## Notes 
debounce ignores args, so regardless of the telemetry metadata, it will
only be emitted once per day. That is, regardless of the value of
`credentialsSourceId`, it is emitted once per day.

---

- Treat all work as PUBLIC. Private `feature/x` branches will not be
squash-merged at release time.
- Your code changes must meet the guidelines in
[CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines).
- License: I confirm that my contribution is made under the terms of the
Apache 2.0 license.
s7ab059789 pushed a commit to s7ab059789/aws-toolkit-vscode that referenced this pull request Feb 19, 2025
… remove debugging. (aws#6548)

## Problem
Alternative solution to
aws#6541

## Solution
- debounce very aggressively (once per day).
- remove debugging `function_call`. 

## Notes 
debounce ignores args, so regardless of the telemetry metadata, it will
only be emitted once per day. That is, regardless of the value of
`credentialsSourceId`, it is emitted once per day.

---

- Treat all work as PUBLIC. Private `feature/x` branches will not be
squash-merged at release time.
- Your code changes must meet the guidelines in
[CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines).
- License: I confirm that my contribution is made under the terms of the
Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants