Skip to content

Conversation

@nkomonen-amazon
Copy link
Contributor

Problem

We are seeing weird issues with some users when crash reporting due to crashes for a specific session id being reported multiple times, even though it should report one.

Solution

Add more telemetry data so that we can see when the heartbeat is successfully sent, and what the crash checker sees when it runs.

Additional

There are other small commits that make small changes, each has a description for what it is doing

License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

This is in preparation for a more simple implementation in the next commit

Signed-off-by: nkomonen-amazon <[email protected]>
- Increase check interval from 10 to 30 minutes, so 15 minute heartbeats.
- Emit failure if heartbeat abort itself fails. This will let us know that if heartbeats fail and we try to clean
  them up, that the cleanup failed. We need to know this otherwise there will be incorrectly reported crashes.
- Emit a metric when we detect a computer restart and handle it
- Validate that when we write a new heartbeat we can immediately read it from disk.

Signed-off-by: nkomonen-amazon <[email protected]>
- Now we emit an event for every heartbeat (every 10 minutes)
- Include a timestamp field in the heartbeat and session_end so we
  can see the diff if any

Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
We do not need the PID anymore since we fixed the sessionId so that
a new one is generated even after the ext host crashes and restarts.

The PID previously allowed us to differentiate between ide instances with
the same sessionId but different PIDs

Signed-off-by: nkomonen-amazon <[email protected]>
simplify the code that deletes the heartbeat file

Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
Signed-off-by: nkomonen-amazon <[email protected]>
@nkomonen-amazon nkomonen-amazon requested a review from a team as a code owner October 24, 2024 15:24
@github-actions
Copy link

This pull request implements a feature or fix, so it must include a changelog entry. See CONTRIBUTING.md#changelog for instructions.

Comment on lines 49 to 59
it('should return the first 4 and last 4 characters of a valid UUID', function () {
const fullUUID = 'aaaabbbb-cccc-dddd-eeee-ffffhhhhiiii'
const result = truncateUuid(fullUUID)
assert.strictEqual(result, 'aaaa...iiii')
})

it('should handle a different valid UUID correctly', function () {
const fullUUID = '12340000-0000-0000-0000-000000005678'
const result = truncateUuid(fullUUID)
assert.strictEqual(result, '1234...5678')
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, merged with the other

Comment on lines +86 to +89
// - We do not support cloud desktop due to false crash reports due to sleep+vpn+ssh timeouts with the IDE
if (isWeb() || (await getComputeEnvType()) === 'cloudDesktop-amzn') {
return undefined
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not local only? Wouldn't any remote environment potentially have this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of noise caused specifically by cloud desktop, compared to the other ssh'd instances it is not much. This is probably since we have lots of Amazon devs using it.

The goal is to get rid of the noise while still collecting data on the others so that if we need it in the future it is there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that apply to many/most "remote" situations?

Signed-off-by: nkomonen-amazon <[email protected]>
Handles a possible edge case when sending a heartbeat

Signed-off-by: nkomonen-amazon <[email protected]>
@nkomonen-amazon
Copy link
Contributor Author

/runIntegrationTests

@nkomonen-amazon nkomonen-amazon merged commit 1621b4d into aws:master Oct 24, 2024
24 of 30 checks passed
@nkomonen-amazon nkomonen-amazon deleted the changeTimeLag branch October 24, 2024 18:57
]
},
{
"name": "ide_heartbeat",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toolkit_heartbeat may fit better. ide_ is for non-extension evens (from the ide)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants