Skip to content

Conversation

@nkomonen-amazon
Copy link
Contributor

@nkomonen-amazon nkomonen-amazon commented Sep 27, 2024

Problem

We want to know when the extension crashes. Having telemetry will be useful.
The problem is that when the extension crashes it is not running anymore so it cannot report that it crashed.

Solution

Create a Crash Monitoring mechanism that has all the instances of an extension (Q/Toolkit) work together to monitor eachother for crashes, reporting of a crash if it is detected.

To review start with the main file packages/core/src/shared/crashMonitoring/crashMonitoring.ts, specifically the CrashMonitoring.start() method.

There is more information in the docstring of the class.


License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions
Copy link

This pull request implements a feature or fix, so it must include a changelog entry. See CONTRIBUTING.md#changelog for instructions.

@nkomonen-amazon nkomonen-amazon changed the title feat: Crash reporting feat(telemetry): Crash reporting Sep 27, 2024
@nkomonen-amazon nkomonen-amazon force-pushed the crashReporting branch 6 times, most recently from 9b5f475 to 72bd7f5 Compare September 30, 2024 19:14
@nkomonen-amazon nkomonen-amazon changed the title feat(telemetry): Crash reporting feat(telemetry): Crash Monitoring Sep 30, 2024
},
forceIdeCrash: {
label: 'Crash: Force IDE ExtHost Crash',
detail: `Will SIGKILL ExtHost, { pid: ${process.pid}, sessionId: '${getSessionId().slice(0, 8)}-...' }, but the IDE itself will not crash.`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this different than vscode's Restart Extension Host command?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restart Extension Host still results in a graceful shutdown of the extension host (deactivate() runs), while this one is non-graceful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need its own subdir or could it live in shared/crashmonitoring.ts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in its current state this makes sense. Will update

/** Start the Crash Monitoring process */
public async start() {
if (isWeb()) {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth a log message

Suggested change
return
debug('skipping crash monitoring in web-mode')
return

}

// This is the metric to let us know the extension crashed
telemetry.session_end.emit({
Copy link
Contributor

@justinmk3 justinmk3 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we force the proxied sessionId into the sessionId field (and eliminate proxiedSessionId)? there's not really any use for the sessionId field from the current (non-proxied) session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sessionId is adjacent to clientId and is injected in a different part of the metric right before sending:

We have to do some hacky stuff to get it to work but it is possible, though I don't think is worth the effort atm

Copy link
Contributor

@justinmk3 justinmk3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove getPids, binarySearch, and dependencies proper-lock , ps-list in this PR

@nkomonen-amazon
Copy link
Contributor Author

can we remove getPids, binarySearch, and dependencies proper-lock , ps-list in this PR

@justinmk3 opened PR here: #5707

@nkomonen-amazon nkomonen-amazon force-pushed the crashReporting branch 5 times, most recently from b3e148c to 38ffb89 Compare October 2, 2024 16:38
@nkomonen-amazon
Copy link
Contributor Author

nkomonen-amazon commented Oct 2, 2024

/runIntegrationTests

'globalStorage', // from vscode globalStorageUri
CrashMonitoring.rootDir,
CrashMonitoring.runningExtDir,
CrashMonitoring.shutdownExtDir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this going to un-scrub the user's entire /tmp dir path, or only the last segment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just the dir names, not the full path. As part of the move to the main constants file I've renamed these variables to be more clear.

@nkomonen-amazon nkomonen-amazon marked this pull request as ready for review October 3, 2024 01:25
@nkomonen-amazon nkomonen-amazon requested a review from a team as a code owner October 3, 2024 01:25
@nkomonen-amazon
Copy link
Contributor Author

/runIntegrationTests

nkomonen-amazon and others added 9 commits October 3, 2024 14:54
Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>

redid the constructors + extracted state

Signed-off-by: nkomonen-amazon <[email protected]>
no longer needed that we have a single file for crash reporting

Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
@nkomonen-amazon nkomonen-amazon merged commit 65b0148 into aws:master Oct 3, 2024
8 of 10 checks passed
@nkomonen-amazon nkomonen-amazon deleted the crashReporting branch October 3, 2024 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants