-
Notifications
You must be signed in to change notification settings - Fork 749
telemetry(auth): Update metrics to better debug Auth dropoff #6625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Avoid any invalid logic involved with hasExistingConnections() Signed-off-by: nkomonen-amazon <[email protected]>
Problem: We were seeing some cases where the ClientId was seen as new, but ifFirstUse was reporting as not new. Solution: Validate that if we have a new ClientId, that isFirstUse MUST be true. Otherwise we emit an error, that should help us debug certain cases. Signed-off-by: nkomonen-amazon <[email protected]>
…th_userState Signed-off-by: nkomonen-amazon <[email protected]>
Before, on a users first use, it was technically possible for Q to indicate it was NOT the first use if somehow there were existing connections detected. A previous commit removed that case. But we want to now emit telemetry if we detected that we WOULD have run in to that case when not expected. Signed-off-by: nkomonen-amazon <[email protected]>
Problem: The inital problem is that when we focused chat, it returned succeeded even though it didn't have any idea if the UI was actually rendered to the user. Solution: This new solution will have the UI emit a telemetry metric once per session that indicates when the UI has loaded. It will also distinguish between the login and reauth page. The metric is `webview_load` and the `webviewName` field will distinguish between login and reauth NOTE: When Q chat itself has its UI ready we already emit a `webview_load` metric. So use all of these metrics to determine what the user actually saw Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
b37a64a to
b2b639a
Compare
|
/runIntegrationTests |
| } | ||
|
|
||
| private didCall: { login: boolean; reauth: boolean } = { login: false, reauth: false } | ||
| public setUiReady(state: 'login' | 'reauth') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something we could just use the once() util for? That way you don't have to keep track of the extra state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to emit once for login and reauth separately which didn't allow me to use once(). But I think in a future PR we could do some memoize + once to address this
|
|
||
| describe('ExtensionUse.isFirstUse()', function () { | ||
| let instance: ExtensionUse | ||
| const notHasExistingConnections = () => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be hasNoExistingConnections ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll follow up with the name change after. Don't want to have to run CI again unless theres something critical.
| } | ||
|
|
||
| if (isAmazonQ()) { | ||
| this.isFirstUseCurrentSession = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: why is this always true for amazonQ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was previous logic where if we detected an auth connection in Toolkit that we would not consider it a first time user. But this doesn't make sense for Q.
Also if they get this far in the function it is guaranteed they are a first time user, since they would have otherwise returned earlier
| ] | ||
| }, | ||
| { | ||
| "name": "session_start", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? There is already a metric for starting the plugin. Adding similar, semantically-redundant metrics will make it harder to reason about the lifecycle and telemetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this metric already exists. So different question is can the existing metric be updated instead , in the common repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I plan to do this. It was just due to time constraints. I'll follow up after this PR
| "description": "When the Amazon Q sign in page is opened and focused.", | ||
| "metadata": [ | ||
| { | ||
| "type": "source", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add this to the existing metric? https://github.com/aws/aws-toolkit-common/blob/ccb16ca4f73b07c4e1cd79da159312c9ffe403a7/telemetry/definitions/commonDefinitions.json#L2915
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm being rushed to get the change out for this release, but I will port this over after
| // Our callback won't fire on the first view. | ||
| if (webviewView.visible) { | ||
| telemetry.auth_signInPageOpened.emit({ result: 'Succeeded', passive: true }) | ||
| telemetry.auth_signInPageOpened.emit({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify this metric and make it more general in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we need to better structure out telemetry since it is messy with what "opened" means. We can also collapse all webviews in to a single metric probably
jpinkney-aws
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the time constraints we're under for testing the amazon q changes, I don't see anything majorly blocking. I think we definitely need follow up PRs for things like porting back to telemetry though.
Also, is some of this code going to live on after us finding the problem or will we scrap it afterwords?
|
Some of these metrics will be removed (mainly the I will look in to a fast follow-up. |
## Problem We noticed that there was an auth dropoff between `session_start` with a brand new clientId, versus when the `auth_userState` metric indicated `isFirstUse` meaning the user is net new. We went from 12.7k for `session_start` to 9.8k for `auth_userState`, these should have been basically the same. ## Solution Add in certain metrics to help debug where the discrepancy is coming from: - When we determine the user is a first time user, we will also check if the clientId is newly generated. If this is not the case we know there is a discrepancy here - The relevant metric will be `function_call` with a `functionName: isFirstUse`, `result: Failed`, and a `reason: ClientIdAlreadyExisted` - When we determine the user is a first time user, if we detected that they had previous auth connections, this will indicate a likely cause for the discrepancy - The relevant metric will be `function_call` with `reason: UnexpectedConnections` - We will emit metrics when the Auth Login page loads since that also had a discrepancy and the telemetry did not exist - The relevant metric is `webview_load` and it will indicate when the Auth Login/Reauth page has actually loaded - Previously we were observing the telemetry for the command `aws.amazonq.focusChat`, but all this did was emit when called and didn't confirm the UI actually loaded. - We will also add the `isFirstUse` metric source value in to some other existing metrics --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]>
* fix(appbuilder): pass in the negative version of --use-container when using build quickpick (aws#6603) ## Problem When users use appbuilder to build their lambda functions, they choose between using their samconfig file or manually selecting the build parameters/flags. The problem is that when the user selects build flags and intentionally doesn't select the ```--use-container``` flag, the command will still be run with --use-container if the samconfig file has ```use_container``` is set to true. ## Solution Whenever the user manually selects the build flags and doesn't select ```--use-container```, we add the negative version of ```--use-container```, which is ```--no-use-container```. This serves as an override if the samconfig file has ```--use-container``` set to true. --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * ci(jscpd): merge target branch in jscpd to avoid false negatives. (aws#6572) ## Problem - Follow up to aws#6564 (review). ## Solution - It appears that there is an undocumented "feature" that GHA don't run when there is a merge conflict. See [here](https://github.com/orgs/community/discussions/11265) - This means we don't have to handle the failure case where a merge fails. - Add fake config identity to mitigate this error: <img width="913" alt="image" src="https://github.com/user-attachments/assets/cd426ec7-e1ca-4d13-a3b1-3985b5593c07" /> ## Notes Going to let this sit and make sure it works as changes are merged into master. --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Justin M. Keyes <[email protected]> * ci: fix and enable post-release notification (aws#6613) - Enable for prod runs - Fix script slightly because the way codebuild runs bash and the way my local runs bash seems to not be the same. - Tested on dev release pipeline --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * refactor: notify.txt typos (aws#6616) --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * config(amazonq): update polling config for codefix (aws#6617) ## Problem Increase in codefix timeouts ## Solution Increase default timeout and lower the polling frequency --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * fix(amazonq): auto-review removes existing issues (aws#6535) ## Problem Auto-reviews often produce less code issues than manual reviews, but the current behavior is to remove all the issues in the file when processing the new ones. This means that issues discovered by manual reviews but not auto-reviews will silently disappear if auto-reviews is enabled. ## Solution - Auto-reviews should not clear the previous issues, but instead merge in the new results to the existing group. - Fixed a related issue with the `ignoreIssue` command being flaky --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * feat(amazonq): /doc: add support for infrastructure diagrams (aws#6561) Problem: - Amazon Q does not have support for infrastructure diagrams Solution: - Add support for them  --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. Co-authored-by: Viktor Shesternyak <[email protected]> * refactor(cleanup): remove dead code (aws#6619) This does nothing anymore --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. Signed-off-by: nkomonen-amazon <[email protected]> * telemetry(auth): Update metrics to better debug Auth dropoff (aws#6625) ## Problem We noticed that there was an auth dropoff between `session_start` with a brand new clientId, versus when the `auth_userState` metric indicated `isFirstUse` meaning the user is net new. We went from 12.7k for `session_start` to 9.8k for `auth_userState`, these should have been basically the same. ## Solution Add in certain metrics to help debug where the discrepancy is coming from: - When we determine the user is a first time user, we will also check if the clientId is newly generated. If this is not the case we know there is a discrepancy here - The relevant metric will be `function_call` with a `functionName: isFirstUse`, `result: Failed`, and a `reason: ClientIdAlreadyExisted` - When we determine the user is a first time user, if we detected that they had previous auth connections, this will indicate a likely cause for the discrepancy - The relevant metric will be `function_call` with `reason: UnexpectedConnections` - We will emit metrics when the Auth Login page loads since that also had a discrepancy and the telemetry did not exist - The relevant metric is `webview_load` and it will indicate when the Auth Login/Reauth page has actually loaded - Previously we were observing the telemetry for the command `aws.amazonq.focusChat`, but all this did was emit when called and didn't confirm the UI actually loaded. - We will also add the `isFirstUse` metric source value in to some other existing metrics --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]> --------- Signed-off-by: nkomonen-amazon <[email protected]> Co-authored-by: Frederic Mbea <[email protected]> Co-authored-by: Hweinstock <[email protected]> Co-authored-by: Justin M. Keyes <[email protected]> Co-authored-by: Maxim Hayes <[email protected]> Co-authored-by: Tai Lai <[email protected]> Co-authored-by: Viktor Shesternyak <[email protected]> Co-authored-by: Viktor Shesternyak <[email protected]> Co-authored-by: Nikolas Komonen <[email protected]>
Problem
We noticed that there was an auth dropoff between
session_startwith a brand new clientId, versus when theauth_userStatemetric indicatedisFirstUsemeaning the user is net new. We went from 12.7k forsession_startto 9.8k forauth_userState, these should have been basically the same.Solution
Add in certain metrics to help debug where the discrepancy is coming from:
function_callwith afunctionName: isFirstUse,result: Failed, and areason: ClientIdAlreadyExistedfunction_callwithreason: UnexpectedConnectionswebview_loadand it will indicate when the Auth Login/Reauth page has actually loadedaws.amazonq.focusChat, but all this did was emit when called and didn't confirm the UI actually loaded.isFirstUsemetric source value in to some other existing metricsfeature/xbranches will not be squash-merged at release time.