-
Notifications
You must be signed in to change notification settings - Fork 746
fix(crash): Add more telemetry info to crash reporting #5850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is in preparation for a more simple implementation in the next commit Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
- Increase check interval from 10 to 30 minutes, so 15 minute heartbeats. - Emit failure if heartbeat abort itself fails. This will let us know that if heartbeats fail and we try to clean them up, that the cleanup failed. We need to know this otherwise there will be incorrectly reported crashes. - Emit a metric when we detect a computer restart and handle it - Validate that when we write a new heartbeat we can immediately read it from disk. Signed-off-by: nkomonen-amazon <[email protected]>
- Now we emit an event for every heartbeat (every 10 minutes) - Include a timestamp field in the heartbeat and session_end so we can see the diff if any Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
We do not need the PID anymore since we fixed the sessionId so that a new one is generated even after the ext host crashes and restarts. The PID previously allowed us to differentiate between ide instances with the same sessionId but different PIDs Signed-off-by: nkomonen-amazon <[email protected]>
simplify the code that deletes the heartbeat file Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
|
This pull request implements a feature or fix, so it must include a changelog entry. See CONTRIBUTING.md#changelog for instructions. |
| it('should return the first 4 and last 4 characters of a valid UUID', function () { | ||
| const fullUUID = 'aaaabbbb-cccc-dddd-eeee-ffffhhhhiiii' | ||
| const result = truncateUuid(fullUUID) | ||
| assert.strictEqual(result, 'aaaa...iiii') | ||
| }) | ||
|
|
||
| it('should handle a different valid UUID correctly', function () { | ||
| const fullUUID = '12340000-0000-0000-0000-000000005678' | ||
| const result = truncateUuid(fullUUID) | ||
| assert.strictEqual(result, '1234...5678') | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, merged with the other
| // - We do not support cloud desktop due to false crash reports due to sleep+vpn+ssh timeouts with the IDE | ||
| if (isWeb() || (await getComputeEnvType()) === 'cloudDesktop-amzn') { | ||
| return undefined | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not local only? Wouldn't any remote environment potentially have this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of noise caused specifically by cloud desktop, compared to the other ssh'd instances it is not much. This is probably since we have lots of Amazon devs using it.
The goal is to get rid of the noise while still collecting data on the others so that if we need it in the future it is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would that apply to many/most "remote" situations?
Signed-off-by: nkomonen-amazon <[email protected]>
Handles a possible edge case when sending a heartbeat Signed-off-by: nkomonen-amazon <[email protected]>
|
/runIntegrationTests |
| ] | ||
| }, | ||
| { | ||
| "name": "ide_heartbeat", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toolkit_heartbeat may fit better. ide_ is for non-extension evens (from the ide)
Problem
We are seeing weird issues with some users when crash reporting due to crashes for a specific session id being reported multiple times, even though it should report one.
Solution
Add more telemetry data so that we can see when the heartbeat is successfully sent, and what the crash checker sees when it runs.
Additional
There are other small commits that make small changes, each has a description for what it is doing
License: I confirm that my contribution is made under the terms of the Apache 2.0 license.