-
Notifications
You must be signed in to change notification settings - Fork 735
fix(crash): handle sleep/wake appropriately #5787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/runIntegrationTests |
|
This pull request modifies code in src/ but no tests were added/updated. Confirm whether tests should be added or ensure the PR description explains why tests are not required. |
|
This pull request implements a feature or fix, so it must include a changelog entry. See CONTRIBUTING.md#changelog for instructions. |
| * Solution: Keep track of the lag, and then skip the next crash check if there was a lag. This will give time for the | ||
| * next heartbeat to be sent. | ||
| */ | ||
| class TimeLag { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this class abstracting? why is this not part of the existing Heartbeat class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying issue is checking for a stale heartbeat. We cannot reliably compensate for lag by sending a heartbeat since the crash checking is a separate process, and there becomes a race condition between sending a fresh heartbeat vs checking for a stale one. The solution is that the crash checker compensates for "time lag" by skipping a check
This class could live inside the CrashChecker class, but I'm currently passing the TimeLag instance in as a dependency to it.
| private isCompleted: Promise<void> | undefined | ||
| /** Resolves {@link isCompleted} when the next interval of lag checking is completed */ | ||
| private setCompleted: (() => void) | undefined | ||
|
|
||
| /** The last timestamp we remember. If this is more than 1 {@link lagCheckInterval} we probably did a sleep+wake */ | ||
| private latestTimestamp: number = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these 3 properities are common to the existing Timeout class in our timoutUtils module. does using Timeout save any code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created this SIM as there is some extra work since we need an interval instead of a timeout: IDE-15104
| * Heartbeats that indicate the extension instance is still running. | ||
| * {@link CrashChecker} listens for these. | ||
| */ | ||
| class Heartbeat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hearbeat is a concept that continues to come up, e.g. we have another heartbeat for codecatalyst:
| private tryStartActivityHeartbeat = shared(async (reauth: boolean) => { |
Even if the codecatalyst heartbeat is too different to be useful here, is this new Heartbeat not amenable to generalization? It is a general concept that is needed by different features that want to emit data on a timed basis.
Do we have a backlog item to pull this out / generalize it? Are features easily able to attach data to a toolkit_heartbeat metric, or did we name the metric crashreporting_x (not intuitive)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't much reuse between Codecatalyst heartbeat and this heartbeat atm. This one heartbeats on an interval, and the other heartbeats when the user performs actions in the IDE.
We could make an interface for the common apis, but this doesn't deduplicate any existing code.
I'll made a backlog item for unifying the heartbeats: https://issues.amazon.com/issues/IDE-15108
Are features easily able to attach data to a toolkit_heartbeat metric, or did we name the metric crashreporting_x (not intuitive)?
We dont have a metric sent for heartbeats since it is a bit noisy.
Problem: When a user does a sleep then wake, our heartbeats are not up to date since they cannot send when the user's computer is asleep. Due to this there is a race condition between the heartbeats sent and when we check the heartbeats to determine if they are stale (crash). Solution: Keep a TimeLag class that helps to assert that there is no lag in our time. It works by updating a state every second. And if we determine that the next update to that state took longer than a second, we determine that there was a lag. Then we simply skip the next crash check, allowing a fresh heartbeat to be sent. Signed-off-by: nkomonen-amazon <[email protected]>
There was some excessive code due to an older design. This removes the use of nested folders in favor of using the root folder to store state. This simplifies the code. Signed-off-by: nkomonen-amazon <[email protected]>
d9eb132 to
f9a84e9
Compare
Signed-off-by: nkomonen-amazon <[email protected]>
|
Approved in case this is needed for the release. Although CC heartbeat doesn't have much overlap, and currently there is no other consumer, we know that there are multiple future use-cases for attaching data to a global heartbeat (and this would have eliminated some existing metrics), so it's a good idea to start thinking about lifting this out of crash-reporting. Especially if a metric [will be] involved. Possibly the |
Signed-off-by: nkomonen-amazon <[email protected]>
## Problem: When a user does a sleep then wake of their computer, the heartbeat is not up to date since it cannot be sent when the user's computer is asleep. Due to this there is a race condition on wake between the next fresh heartbeat being sent versus when we check the heartbeats to determine if they are stale (crash). If the crash checker runs before a new heartbeat can be sent, it will be seen as a crash. ## Solution: Use a TimeLag class that helps to determine when there is a time discrepancy. It works by updating a state every second, and if we determine that the next update to that state took longer than a second, we determine that there was a lag. Then we simply skip the next crash check, allowing a fresh heartbeat to be sent. --- <!--- REMINDER: Ensure that your PR meets the guidelines in CONTRIBUTING.md --> License: I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: nkomonen-amazon <[email protected]>
Problem:
When a user does a sleep then wake of their computer, the heartbeat is not up to date since it cannot be sent when the user's computer is asleep.
Due to this there is a race condition on wake between the next fresh heartbeat being sent versus when we check the heartbeats to determine if they are stale (crash). If the crash checker runs before a new heartbeat can be sent, it will be seen as a crash.
Solution:
Use a TimeLag class that helps to determine when there is a time discrepancy. It works by updating a state every second, and if we determine that the next update to that state took longer than a second, we determine that there was a lag. Then we simply skip the next crash check, allowing a fresh heartbeat to be sent.
License: I confirm that my contribution is made under the terms of the Apache 2.0 license.