Uncaught Exception Handling #32
Replies: 3 comments
-
|
Great initiative! There is an example in the Winston repository. It's used on('finish') event: finish-events.js |
Beta Was this translation helpful? Give feedback.
-
|
Regarding Sentry, the official implementation is a bit different from ours. |
Beta Was this translation helpful? Give feedback.
-
|
@kavanagh suggested something which it's worth capturing. When it comes to the API for our error handling, we may later want to integrate some more complex/magical things like Sentry in future. This would be difficult if we tied ourselves to the suggested API of exposing a handler and requiring apps to bind it to handlers, e.g. const uncaughtExceptionHandler = require('@dotcom-reliability-kit/uncaught-exception-handler');
process.on('uncaughtException', uncaughtErrorHandler);
process.on('unhandledRejection', uncaughtErrorHandler);It's also error-prone and a little boilerplatey. It might be more sensible if the API is something like this: const uncaughtExceptionHandler = require('@dotcom-reliability-kit/uncaught-exception-handler');
uncaughtExceptionHandler.bind({
useSentry: true, // opt into or out of Sentry
exitCode: 1, // just an example of other configurable stuff
onError: error => {
// allow the app to inject extra behaviour when an uncaught exception occurs
}
}); |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been thinking a lot about uncaught exceptions and doing a few experiments to better understand how it works in Node.js. I think we're going to have to do more work than I initially thought to get uncaught exceptions logged consistently but I think it's worth doing.
What's the problem we're trying to solve?
When an exception is thrown and not caught or when a promise is rejected without being handled, a Node.js process will crash. When Node.js crashes we want to know exactly why, and we want the details of the crash to be consistent with the way we log errors.
We want to be able to search Splunk for errors which caused a crash in the same way we search for those which resulted in a
500error page.How do we solve it?
On the surface this should be relatively easy to solve (especially now we have consistent error logging). Node.js provides two events on
processnameduncaughtExceptionandunhandledRejection. These allow you to execute some code between the error occurring and the app crashing:This could be nicely extracted into a library, so that each app can register these handlers by choice and potentially extend the behaviour to perform their own tear-down:
However there are two issues with the above approach.
Issue 1: async logging (solved)
The first issue is that, in production, we currently log everything asynchronously to Splunk via our splunkHEC transport. The Node.js documentation is clear about the correct use of these handlers:
We can't guarantee that logs have sent before we exit the process in our handler. Winston does not allow you to hook in and run code once you're certain that a log has been sent and so we'd have to add in a nasty (and unreliable) hack like this:
I think the correct solution to this problem is to ditch our custom Splunk logger (as is the plan anyway) and log to
stdout, using Heroku log drains to forward our logs to Splunk.We could accept a dirty hack like I outline above if we know that it's temporary and will be removed once we switch to Heroku log drains. However we know hacks can stick around for a lot longer than we intendSince writing this we have begun to migrate to Heroku log drains, so this issue goes away in the near future.Issue 2: n-raven (part-solved)
The second issue is that, in production, we automatically register a global
uncaughtExceptionhandler via the Sentry install method here. This is included as a non-optional part of n-express, which is used by all of our apps.This means that regardless of whether we solve issue 1, it's always possible that Sentry beats us to writing logs and exits the process before our uncaught exception handler has a chance to.
My opinion is that importing n-express or n-raven should not make modifications to a global object like
processand that we should remove this behaviour. I think it's better if the application (and our template) gets to make a decision about which uncaught exception handlers we register.Getting to this point is more work. I think, as we need to switch to the new Sentry client anyway (see note here), that we should use this opportunity to deprecate or overhaul n-raven and switch apps over to the new Sentry Node.js Platform which does a lot less magic.
We'd then be able to roll Sentry logging into our own uncaught exception handler alongside our own Splunk logging format instead of magically registering it all as part of n-express. We could also add Sentry logging into our Express error logging middleware, further reducing the amount of code in n-express which probably should have never been there.
We're going to look into deprecating n-raven and providing an alternative in Reliability Kit. One possible approach is documented here, there will be updates when we've finished investigating how much our engineers use Sentry.
Feedback
I'd like to hear if you have alternative approaches to suggest, or if you know something about our uncaught exception handling that I don't.
Beta Was this translation helpful? Give feedback.
All reactions