-
Notifications
You must be signed in to change notification settings - Fork 751
telemetry(webview): Emit toolkit_X module telemetry on auth webview #6791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
telemetry(webview): Emit toolkit_X module telemetry on auth webview #6791
Conversation
|
| * This would be equivalent of the duration between "user clicked open q" and "ui has become available" | ||
| * NOTE: Amazon Q UI is only loaded ONCE. The state is saved between each hide/show of the webview. | ||
| */ | ||
| telemetry.webview_load.emit({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this is done for all the webviews, will webview_load be deprecated in favor of the more granular metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, webview_x will be deprecated for something like toolkit_moduleX. So we'll also drop webview_error as well
| reasonDesc: msg.errorMessage, | ||
| }) | ||
| if (msg.event === 'toolkit_didLoadModule') { | ||
| telemetry.toolkit_didLoadModule.emit({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the webview_error metric exist? Shouldn't it in theory mirror the failures of toolkit_didLoadModule or are there non-error reasons a webview fails to load?
If we have a separate metric for error, shouldn't we emit it in this case since we still received an error from the webview?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of dropping webview_error and creating something like toolkit_moduleError. This will capture any errors that happen after loading has happened, anything before would be captured in toolkit_didLoadModule.
I have it as a TODO to deprecate webview_error, this will also allow us to deal with different field names (module vs webviewName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toolkit_moduleError. This will capture any errors that happen after loading has happened, anything before would be captured intoolkit_didLoadModule.
? errors should be part of all metrics. There should not be a separate "foo_error" metric.
| private setupTelemetry() { | ||
| this.instance.traceId = randomUUID() | ||
| // Notify intent to open a module, this does not mean it successfully opened | ||
| telemetry.toolkit_willOpenModule.emit({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where we intend to open a module (emit willOpenModule) then don't also emit a didLoadModule with either fail or succeed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, every webview will emit a willOpenModule and didLoadModule will need to be explicitly done by each webview.
This is due to how loading a webview works. We can only indicate our intent to open a webview but have no formal way to know when it has opened (we create a vscode webview instance and set a string of HTML, then have no insight to what happens after that)
So there will be cases where we only have willOpenModule and no trailing didLoadModule. This is essentially how we have it right now, but under different metric names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, did/will pairs should not be needed. Only the "did" case is needed for most metrics, because the metric will wrap the impl logic and track a duration. Why do we need both here?
Problem:
With our telemetry, we do not know when the frontend webview UI has actually loaded
Solution:
Emit certain metrics during the webview loading process to get a better idea of if the
webview UI successfully completed its initial load.
- toolkit_willOpenModule, indicates intent to render a webview
- toolkit_didLoadModule, indicates the final result of loading the webview
- On Success it it just a success result. We know a success when the frontend send a successful
message to the backend. It knows this by ensuring there were no errors and that a certain HTML element
can be found, then once the page finishes its initial load it will send a success message to the backend.
- On Failure, what happens is a timer times out after 10 seconds.
Signed-off-by: nkomonen-amazon <[email protected]>
Setting .html starts the loading of the UI, but setup() sets up the message listeners in the backend for messages from the UI. We had setup() come after, and it worked, but if I added a small sleep() before setup() was run it would result in a failure due to messages not being handled due to handlers not being setup in time. As a solution this just moves the handler setup before we set the new UI. Signed-off-by: nkomonen-amazon <[email protected]>
a420d8b to
da0b2f9
Compare
| reasonDesc: msg.errorMessage, | ||
| }) | ||
| if (msg.event === 'toolkit_didLoadModule') { | ||
| telemetry.toolkit_didLoadModule.emit({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toolkit_moduleError. This will capture any errors that happen after loading has happened, anything before would be captured intoolkit_didLoadModule.
? errors should be part of all metrics. There should not be a separate "foo_error" metric.
| telemetry.webview_error.emit({ | ||
| webviewName: qChatModuleName, | ||
| result: 'Failed', | ||
| reasonDesc: msg.errorMessage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a separate webview_error metiric? The error should be part of the toolkit_didLoadModule metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didLoadModule is mainly intended for the initial load, but if there is an error post-load then we will want a separate metric for that. Example is after clicking the "submit" button after putting in the startUrl+Region for signin
| * A webview that supports this will call {@link setDidLoad} | ||
| * to confirm the UI has successfully loaded. | ||
| */ | ||
| public supportsLoadTelemetry: boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this flag? can we just try-and-handle-failure instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indicates for us to set up a timeout which sends a failure after 10 seconds of no response (makes assumption that the webview didn't postMessage to the backend due to failure). And each webview needs some customization to support the expected postMessage, so by default only webviews we do the custom work for will support it.
My TODO noted above is to update the webview framework so it forces (or at least makes it easy) to set this up
| private setupTelemetry() { | ||
| this.instance.traceId = randomUUID() | ||
| // Notify intent to open a module, this does not mean it successfully opened | ||
| telemetry.toolkit_willOpenModule.emit({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, did/will pairs should not be needed. Only the "did" case is needed for most metrics, because the metric will wrap the impl logic and track a duration. Why do we need both here?
This is due to how webviews asynchronously load, and we aren't able to easily |
Problem:
With our telemetry, we do not know when the frontend webview UI has actually loaded.
The current process looks like the following:
toolkit_willOpenModule)Solution:
Emit certain metrics during the webview loading process to get a better idea of if the webview UI successfully completed its initial load.
toolkit_willOpenModule, indicates intent to render a webview. It does not mean the user is seeing anything.toolkit_didLoadModule, indicates the final result of loading the webviewresult: Succeededwhen the frontend send a successful message to the backend. It knows this by ensuring there were no errors and that a certain HTML element can be found, then once the page finishes its initial load it will send a success message to the backend.result: Failed, what happens is a timer has timed out after 10 seconds. We assume that since there was no response from the frontend, it failed to fully execute the HTML/JS.toolkit_willOpenModuleandtoolkit_didLoadModuleso that we can connect them through telemetry. This includestraceIdand thedurationwhich is the time between the 2 metrics.This PR only applies to the Login and Reauth page for now, and future Vue webviews will need to implement some things on their end to get this functionality.
TODO
feature/xbranches will not be squash-merged at release time.