telemetry(webview): Emit toolkit_X module telemetry on auth webview #6791

nkomonen-amazon · 2025-03-14T17:46:05Z

Problem:

With our telemetry, we do not know when the frontend webview UI has actually loaded.

The current process looks like the following:

We create a webview and set the HTML to load, but after that we do not have a formal way to detect if the webview actually loaded the HTML/JS successfully. We only know that the process started (toolkit_willOpenModule)

Solution:

Emit certain metrics during the webview loading process to get a better idea of if the webview UI successfully completed its initial load.

toolkit_willOpenModule, indicates intent to render a webview. It does not mean the user is seeing anything.
toolkit_didLoadModule, indicates the final result of loading the webview
- We know a result: Succeeded when the frontend send a successful message to the backend. It knows this by ensuring there were no errors and that a certain HTML element can be found, then once the page finishes its initial load it will send a success message to the backend.
- On result: Failed, what happens is a timer has timed out after 10 seconds. We assume that since there was no response from the frontend, it failed to fully execute the HTML/JS.
- State is shared between toolkit_willOpenModule and toolkit_didLoadModule so that we can connect them through telemetry. This includes traceId and the duration which is the time between the 2 metrics.

This PR only applies to the Login and Reauth page for now, and future Vue webviews will need to implement some things on their end to get this functionality.

TODO

Generalize this solution in a more robust way for other webviews to easily implement this functionality

Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
Your code changes must meet the guidelines in CONTRIBUTING.md.
License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

github-actions · 2025-03-14T17:46:20Z

This pull request modifies code in src/* but no tests were added/updated.
- Confirm whether tests should be added or ensure the PR description explains why tests are not required.

Hweinstock · 2025-03-14T18:05:59Z

packages/core/src/amazonq/webview/messages/messageDispatcher.ts

                 * This would be equivalent of the duration between "user clicked open q" and "ui has become available"
                 * NOTE: Amazon Q UI is only loaded ONCE. The state is saved between each hide/show of the webview.
                 */
-                telemetry.webview_load.emit({


Once this is done for all the webviews, will webview_load be deprecated in favor of the more granular metrics?

Yup, webview_x will be deprecated for something like toolkit_moduleX. So we'll also drop webview_error as well

Hweinstock · 2025-03-14T18:12:06Z

packages/core/src/amazonq/webview/messages/messageDispatcher.ts

-                reasonDesc: msg.errorMessage,
-            })
+            if (msg.event === 'toolkit_didLoadModule') {
+                telemetry.toolkit_didLoadModule.emit({


Why does the webview_error metric exist? Shouldn't it in theory mirror the failures of toolkit_didLoadModule or are there non-error reasons a webview fails to load?

If we have a separate metric for error, shouldn't we emit it in this case since we still received an error from the webview?

I'm thinking of dropping webview_error and creating something like toolkit_moduleError. This will capture any errors that happen after loading has happened, anything before would be captured in toolkit_didLoadModule.

I have it as a TODO to deprecate webview_error, this will also allow us to deal with different field names (module vs webviewName)

toolkit_moduleError. This will capture any errors that happen after loading has happened, anything before would be captured in toolkit_didLoadModule.

? errors should be part of all metrics. There should not be a separate "foo_error" metric.

Hweinstock · 2025-03-14T18:18:07Z

packages/core/src/webviews/main.ts

+            private setupTelemetry() {
+                this.instance.traceId = randomUUID()
+                // Notify intent to open a module, this does not mean it successfully opened
+                telemetry.toolkit_willOpenModule.emit({


Is there a case where we intend to open a module (emit willOpenModule) then don't also emit a didLoadModule with either fail or succeed?

For now, every webview will emit a willOpenModule and didLoadModule will need to be explicitly done by each webview.

This is due to how loading a webview works. We can only indicate our intent to open a webview but have no formal way to know when it has opened (we create a vscode webview instance and set a string of HTML, then have no insight to what happens after that)

So there will be cases where we only have willOpenModule and no trailing didLoadModule. This is essentially how we have it right now, but under different metric names.

In most cases, did/will pairs should not be needed. Only the "did" case is needed for most metrics, because the metric will wrap the impl logic and track a duration. Why do we need both here?

Problem: With our telemetry, we do not know when the frontend webview UI has actually loaded Solution: Emit certain metrics during the webview loading process to get a better idea of if the webview UI successfully completed its initial load. - toolkit_willOpenModule, indicates intent to render a webview - toolkit_didLoadModule, indicates the final result of loading the webview - On Success it it just a success result. We know a success when the frontend send a successful message to the backend. It knows this by ensuring there were no errors and that a certain HTML element can be found, then once the page finishes its initial load it will send a success message to the backend. - On Failure, what happens is a timer times out after 10 seconds. Signed-off-by: nkomonen-amazon <[email protected]>

Setting .html starts the loading of the UI, but setup() sets up the message listeners in the backend for messages from the UI. We had setup() come after, and it worked, but if I added a small sleep() before setup() was run it would result in a failure due to messages not being handled due to handlers not being setup in time. As a solution this just moves the handler setup before we set the new UI. Signed-off-by: nkomonen-amazon <[email protected]>

justinmk3 · 2025-03-17T17:01:36Z

packages/core/src/amazonq/webview/messages/messageDispatcher.ts

-                reasonDesc: msg.errorMessage,
-            })
+            if (msg.event === 'toolkit_didLoadModule') {
+                telemetry.toolkit_didLoadModule.emit({


toolkit_moduleError. This will capture any errors that happen after loading has happened, anything before would be captured in toolkit_didLoadModule.

? errors should be part of all metrics. There should not be a separate "foo_error" metric.

justinmk3 · 2025-03-17T17:02:16Z

packages/core/src/amazonq/webview/messages/messageDispatcher.ts

+                telemetry.webview_error.emit({
+                    webviewName: qChatModuleName,
+                    result: 'Failed',
+                    reasonDesc: msg.errorMessage,


why is there a separate webview_error metiric? The error should be part of the toolkit_didLoadModule metric.

didLoadModule is mainly intended for the initial load, but if there is an error post-load then we will want a separate metric for that. Example is after clicking the "submit" button after putting in the startUrl+Region for signin

justinmk3 · 2025-03-17T17:04:31Z

packages/core/src/webviews/main.ts

+     * A webview that supports this will call {@link setDidLoad}
+     * to confirm the UI has successfully loaded.
+     */
+    public supportsLoadTelemetry: boolean = false


why do we need this flag? can we just try-and-handle-failure instead?

This indicates for us to set up a timeout which sends a failure after 10 seconds of no response (makes assumption that the webview didn't postMessage to the backend due to failure). And each webview needs some customization to support the expected postMessage, so by default only webviews we do the custom work for will support it.

My TODO noted above is to update the webview framework so it forces (or at least makes it easy) to set this up

justinmk3 · 2025-03-17T17:06:16Z

packages/core/src/webviews/main.ts

+            private setupTelemetry() {
+                this.instance.traceId = randomUUID()
+                // Notify intent to open a module, this does not mean it successfully opened
+                telemetry.toolkit_willOpenModule.emit({


In most cases, did/will pairs should not be needed. Only the "did" case is needed for most metrics, because the metric will wrap the impl logic and track a duration. Why do we need both here?

nkomonen-amazon · 2025-03-17T17:28:16Z

@justinmk3

In most cases, did/will pairs should not be needed. Only the "did" case is needed for most metrics, because the metric will wrap the impl logic and track a duration. Why do we need both here?

This is due to how webviews asynchronously load, and we aren't able to easily telemetry.run() around a single function. We could maintain some state object and emit a single didLoadModule, but other IDEs found it easier to have separate start/end metrics as it can be difficult to pass around context in a clean way. I just wanted to keep it consistent between IDEs for now

nkomonen-amazon requested review from a team as code owners March 14, 2025 17:46

nkomonen-amazon marked this pull request as draft March 14, 2025 18:05

Hweinstock reviewed Mar 14, 2025

View reviewed changes

nkomonen-amazon added 2 commits March 17, 2025 11:22

nkomonen-amazon force-pushed the moduleLoadTelemetry branch from a420d8b to da0b2f9 Compare March 17, 2025 15:27

nkomonen-amazon marked this pull request as ready for review March 17, 2025 15:54

justinmk3 approved these changes Mar 17, 2025

View reviewed changes

nkomonen-amazon merged commit e7b7307 into aws:master Mar 21, 2025
16 of 17 checks passed

nkomonen-amazon deleted the moduleLoadTelemetry branch March 21, 2025 16:38

telemetry(webview): Emit toolkit_X module telemetry on auth webview #6791

telemetry(webview): Emit toolkit_X module telemetry on auth webview #6791

Uh oh!

Conversation

nkomonen-amazon commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

TODO

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkomonen-amazon commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nkomonen-amazon commented Mar 14, 2025 •

edited

Loading

nkomonen-amazon commented Mar 17, 2025 •

edited

Loading