telemetry(auth): Update metrics to better debug Auth dropoff #6625

nkomonen-amazon · 2025-02-19T17:33:05Z

Problem

We noticed that there was an auth dropoff between session_start with a brand new clientId, versus when the auth_userState metric indicated isFirstUse meaning the user is net new. We went from 12.7k for session_start to 9.8k for auth_userState, these should have been basically the same.

Solution

Add in certain metrics to help debug where the discrepancy is coming from:

When we determine the user is a first time user, we will also check if the clientId is newly generated. If this is not the case we know there is a discrepancy here
- The relevant metric will be function_call with a functionName: isFirstUse, result: Failed, and a reason: ClientIdAlreadyExisted
When we determine the user is a first time user, if we detected that they had previous auth connections, this will indicate a likely cause for the discrepancy
- The relevant metric will be function_call with reason: UnexpectedConnections
We will emit metrics when the Auth Login page loads since that also had a discrepancy and the telemetry did not exist
- The relevant metric is webview_load and it will indicate when the Auth Login/Reauth page has actually loaded
- Previously we were observing the telemetry for the command aws.amazonq.focusChat, but all this did was emit when called and didn't confirm the UI actually loaded.
We will also add the isFirstUse metric source value in to some other existing metrics

Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
Your code changes must meet the guidelines in CONTRIBUTING.md.
License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

Avoid any invalid logic involved with hasExistingConnections() Signed-off-by: nkomonen-amazon <[email protected]>

Problem: We were seeing some cases where the ClientId was seen as new, but ifFirstUse was reporting as not new. Solution: Validate that if we have a new ClientId, that isFirstUse MUST be true. Otherwise we emit an error, that should help us debug certain cases. Signed-off-by: nkomonen-amazon <[email protected]>

…th_userState Signed-off-by: nkomonen-amazon <[email protected]>

Before, on a users first use, it was technically possible for Q to indicate it was NOT the first use if somehow there were existing connections detected. A previous commit removed that case. But we want to now emit telemetry if we detected that we WOULD have run in to that case when not expected. Signed-off-by: nkomonen-amazon <[email protected]>

Problem: The inital problem is that when we focused chat, it returned succeeded even though it didn't have any idea if the UI was actually rendered to the user. Solution: This new solution will have the UI emit a telemetry metric once per session that indicates when the UI has loaded. It will also distinguish between the login and reauth page. The metric is `webview_load` and the `webviewName` field will distinguish between login and reauth NOTE: When Q chat itself has its UI ready we already emit a `webview_load` metric. So use all of these metrics to determine what the user actually saw Signed-off-by: nkomonen-amazon <[email protected]>

Signed-off-by: nkomonen-amazon <[email protected]>

nkomonen-amazon · 2025-02-19T18:45:03Z

/runIntegrationTests

jpinkney-aws · 2025-02-19T18:49:24Z

packages/core/src/login/webview/vue/backend.ts

    }

+    private didCall: { login: boolean; reauth: boolean } = { login: false, reauth: false }
+    public setUiReady(state: 'login' | 'reauth') {


is this something we could just use the once() util for? That way you don't have to keep track of the extra state?

I wanted to emit once for login and reauth separately which didn't allow me to use once(). But I think in a future PR we could do some memoize + once to address this

jpinkney-aws · 2025-02-19T18:51:22Z

packages/core/src/test/credentials/utils.test.ts


 describe('ExtensionUse.isFirstUse()', function () {
    let instance: ExtensionUse
+    const notHasExistingConnections = () => false


should this be hasNoExistingConnections ?

Makes sense, I'll follow up with the name change after. Don't want to have to run CI again unless theres something critical.

jpinkney-aws · 2025-02-19T18:52:49Z

packages/core/src/auth/utils.ts

+        }
+
+        if (isAmazonQ()) {
+            this.isFirstUseCurrentSession = true


qq: why is this always true for amazonQ?

There was previous logic where if we detected an auth connection in Toolkit that we would not consider it a first time user. But this doesn't make sense for Q.

Also if they get this far in the function it is guaranteed they are a first time user, since they would have otherwise returned earlier

justinmk3 · 2025-02-19T18:54:43Z

packages/core/src/shared/telemetry/vscodeTelemetry.json

            ]
        },
+        {
+            "name": "session_start",


Why is this needed? There is already a metric for starting the plugin. Adding similar, semantically-redundant metrics will make it harder to reason about the lifecycle and telemetry.

What is the metric?

Oh, this metric already exists. So different question is can the existing metric be updated instead , in the common repo

Yeah, I plan to do this. It was just due to time constraints. I'll follow up after this PR

justinmk3 · 2025-02-19T18:56:25Z

packages/core/src/shared/telemetry/vscodeTelemetry.json

+            "description": "When the Amazon Q sign in page is opened and focused.",
+            "metadata": [
+                {
+                    "type": "source",


why not add this to the existing metric? https://github.com/aws/aws-toolkit-common/blob/ccb16ca4f73b07c4e1cd79da159312c9ffe403a7/telemetry/definitions/commonDefinitions.json#L2915

I'm being rushed to get the change out for this release, but I will port this over after

jpinkney-aws · 2025-02-19T18:59:04Z

packages/core/src/login/webview/commonAuthViewProvider.ts

        // Our callback won't fire on the first view.
        if (webviewView.visible) {
-            telemetry.auth_signInPageOpened.emit({ result: 'Succeeded', passive: true })
+            telemetry.auth_signInPageOpened.emit({


Can we modify this metric and make it more general in the future?

Agreed, we need to better structure out telemetry since it is messy with what "opened" means. We can also collapse all webviews in to a single metric probably

jpinkney-aws

Given the time constraints we're under for testing the amazon q changes, I don't see anything majorly blocking. I think we definitely need follow up PRs for things like porting back to telemetry though.

Also, is some of this code going to live on after us finding the problem or will we scrap it afterwords?

nkomonen-amazon · 2025-02-19T19:16:00Z

@jpinkney-aws

Some of these metrics will be removed (mainly the function_call ones to help debug). But the others like session_start and auth_signInPageOpened will be ported. Though I think we may need to look at adding something like source as a global field instead since we have so many cases for it.

I will look in to a fast follow-up.

## Problem We noticed that there was an auth dropoff between `session_start` with a brand new clientId, versus when the `auth_userState` metric indicated `isFirstUse` meaning the user is net new. We went from 12.7k for `session_start` to 9.8k for `auth_userState`, these should have been basically the same. ## Solution Add in certain metrics to help debug where the discrepancy is coming from: - When we determine the user is a first time user, we will also check if the clientId is newly generated. If this is not the case we know there is a discrepancy here - The relevant metric will be `function_call` with a `functionName: isFirstUse`, `result: Failed`, and a `reason: ClientIdAlreadyExisted` - When we determine the user is a first time user, if we detected that they had previous auth connections, this will indicate a likely cause for the discrepancy - The relevant metric will be `function_call` with `reason: UnexpectedConnections` - We will emit metrics when the Auth Login page loads since that also had a discrepancy and the telemetry did not exist - The relevant metric is `webview_load` and it will indicate when the Auth Login/Reauth page has actually loaded - Previously we were observing the telemetry for the command `aws.amazonq.focusChat`, but all this did was emit when called and didn't confirm the UI actually loaded. - We will also add the `isFirstUse` metric source value in to some other existing metrics --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]>

* fix(appbuilder): pass in the negative version of --use-container when using build quickpick (aws#6603) ## Problem When users use appbuilder to build their lambda functions, they choose between using their samconfig file or manually selecting the build parameters/flags. The problem is that when the user selects build flags and intentionally doesn't select the ```--use-container``` flag, the command will still be run with --use-container if the samconfig file has ```use_container``` is set to true. ## Solution Whenever the user manually selects the build flags and doesn't select ```--use-container```, we add the negative version of ```--use-container```, which is ```--no-use-container```. This serves as an override if the samconfig file has ```--use-container``` set to true. --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * ci(jscpd): merge target branch in jscpd to avoid false negatives. (aws#6572) ## Problem - Follow up to aws#6564 (review). ## Solution - It appears that there is an undocumented "feature" that GHA don't run when there is a merge conflict. See [here](https://github.com/orgs/community/discussions/11265) - This means we don't have to handle the failure case where a merge fails. - Add fake config identity to mitigate this error: <img width="913" alt="image" src="https://github.com/user-attachments/assets/cd426ec7-e1ca-4d13-a3b1-3985b5593c07" /> ## Notes Going to let this sit and make sure it works as changes are merged into master. --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Justin M. Keyes <[email protected]> * ci: fix and enable post-release notification (aws#6613) - Enable for prod runs - Fix script slightly because the way codebuild runs bash and the way my local runs bash seems to not be the same. - Tested on dev release pipeline --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * refactor: notify.txt typos (aws#6616) --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * config(amazonq): update polling config for codefix (aws#6617) ## Problem Increase in codefix timeouts ## Solution Increase default timeout and lower the polling frequency --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * fix(amazonq): auto-review removes existing issues (aws#6535) ## Problem Auto-reviews often produce less code issues than manual reviews, but the current behavior is to remove all the issues in the file when processing the new ones. This means that issues discovered by manual reviews but not auto-reviews will silently disappear if auto-reviews is enabled. ## Solution - Auto-reviews should not clear the previous issues, but instead merge in the new results to the existing group. - Fixed a related issue with the `ignoreIssue` command being flaky --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. * feat(amazonq): /doc: add support for infrastructure diagrams (aws#6561) Problem: - Amazon Q does not have support for infrastructure diagrams Solution: - Add support for them ![infra_diagram](https://github.com/user-attachments/assets/79693ab0-d95d-415e-8daf-04d59fed8573) --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. Co-authored-by: Viktor Shesternyak <[email protected]> * refactor(cleanup): remove dead code (aws#6619) This does nothing anymore --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. Signed-off-by: nkomonen-amazon <[email protected]> * telemetry(auth): Update metrics to better debug Auth dropoff (aws#6625) ## Problem We noticed that there was an auth dropoff between `session_start` with a brand new clientId, versus when the `auth_userState` metric indicated `isFirstUse` meaning the user is net new. We went from 12.7k for `session_start` to 9.8k for `auth_userState`, these should have been basically the same. ## Solution Add in certain metrics to help debug where the discrepancy is coming from: - When we determine the user is a first time user, we will also check if the clientId is newly generated. If this is not the case we know there is a discrepancy here - The relevant metric will be `function_call` with a `functionName: isFirstUse`, `result: Failed`, and a `reason: ClientIdAlreadyExisted` - When we determine the user is a first time user, if we detected that they had previous auth connections, this will indicate a likely cause for the discrepancy - The relevant metric will be `function_call` with `reason: UnexpectedConnections` - We will emit metrics when the Auth Login page loads since that also had a discrepancy and the telemetry did not exist - The relevant metric is `webview_load` and it will indicate when the Auth Login/Reauth page has actually loaded - Previously we were observing the telemetry for the command `aws.amazonq.focusChat`, but all this did was emit when called and didn't confirm the UI actually loaded. - We will also add the `isFirstUse` metric source value in to some other existing metrics --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]> --------- Signed-off-by: nkomonen-amazon <[email protected]> Co-authored-by: Frederic Mbea <[email protected]> Co-authored-by: Hweinstock <[email protected]> Co-authored-by: Justin M. Keyes <[email protected]> Co-authored-by: Maxim Hayes <[email protected]> Co-authored-by: Tai Lai <[email protected]> Co-authored-by: Viktor Shesternyak <[email protected]> Co-authored-by: Viktor Shesternyak <[email protected]> Co-authored-by: Nikolas Komonen <[email protected]>

nkomonen-amazon added 6 commits February 18, 2025 21:35

refactor: simplify isFirstUse for Q

a5ab6c3

Avoid any invalid logic involved with hasExistingConnections() Signed-off-by: nkomonen-amazon <[email protected]>

telemetry: add isFirstUse to session_start, auth_signInPageOpened, au…

0231f0f

…th_userState Signed-off-by: nkomonen-amazon <[email protected]>

fix unit tests

b2b639a

Signed-off-by: nkomonen-amazon <[email protected]>

nkomonen-amazon force-pushed the authDropoff branch from b37a64a to b2b639a Compare February 19, 2025 18:30

nkomonen-amazon marked this pull request as ready for review February 19, 2025 18:43

nkomonen-amazon requested a review from a team as a code owner February 19, 2025 18:43

jpinkney-aws reviewed Feb 19, 2025

View reviewed changes

justinmk3 reviewed Feb 19, 2025

View reviewed changes

jpinkney-aws reviewed Feb 19, 2025

View reviewed changes

jpinkney-aws approved these changes Feb 19, 2025

View reviewed changes

nkomonen-amazon merged commit 35502be into aws:master Feb 19, 2025
31 of 32 checks passed

nkomonen-amazon deleted the authDropoff branch February 19, 2025 19:17

telemetry(auth): Update metrics to better debug Auth dropoff #6625

telemetry(auth): Update metrics to better debug Auth dropoff #6625

Uh oh!

Conversation

nkomonen-amazon commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Uh oh!

nkomonen-amazon commented Feb 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkomonen-amazon Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinmk3 Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinmk3 Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpinkney-aws left a comment

Choose a reason for hiding this comment

Uh oh!

nkomonen-amazon commented Feb 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nkomonen-amazon commented Feb 19, 2025 •

edited

Loading

nkomonen-amazon Feb 19, 2025 •

edited

Loading

justinmk3 Feb 19, 2025 •

edited

Loading

justinmk3 Feb 19, 2025 •

edited

Loading