[Feature] Improved Panic Handling by kaimast · Pull Request #2927 · ProvableHQ/snarkVM

kaimast · 2025-09-17T19:55:35Z

This PR is a follow-up on #2813.

Context: Handling Fatal Errors in snarkVM

Generally, snarkVM and snarkOS aim to never panic if it is possible to avoid. However, there is always the possibility of bugs that were overlooked or ending up in a state where the code cannot recover safely.

If the code panics, there are two different scenarios:

If the panic happens inside a program execution (i.e., inside try_vm_runtime) we can safely log it and continue operating normally. snarkVM will simply treat this as a failed execution.
If the panic happens outside a program execution (i.e., anywhere else in snarkVM or in snarkOS), we most likely do not want to simply continue execution, and we definitely want to notify the user and developers about the error.
We had several instances where a thread or tokio task panicked, and, while information about the panic was printed to stdout, there was no mention of it in the logs.

While the former is already implemented. The latter was partially introduced by #2813, but this PR adds some important improvement and fixes. The behavior of try_vm_runtime remains unchanged by this, albeit its implementation changes slightly.

Improved Panic Handling Mechanisms

Panic handling as implemented in #2813 (and also before but to a limited extent) has a major shortcoming: it did not work well with multi-threading. That is because the panic handler is set per-process but, while executing a snarkVM program another thread, unrelated to execution, might panic, and snarkVM may falsely log it as a VM error.

The proposed mechanism in this PR is to store backtraces and error messages in a thread-local variable. try_vm_runtime is changed to retrieve the error message from this thread-local variable.
Additionally, there is now a catch_unwind function that wraps around the function of the same name from the standard library and returns the error message and backtrace on error. The latter is useful when printing error messages about panics, for example, in snarkOS.

Utilities for Task Management

tokio allows propagating panics through JoinHandles. This PR also provides wrappers around tokio's spawn and spawn_blocking functions that contain the boilerplate for propagating such panics.
This replaces logic that before resided in snarkos-node-bft but will be useful to other crates.

Usage

In snarkVM

snarkVM has one notable instance of spawning a background task, the sequential ops thread (see the changes in #2975). 80c81d8 adds catch_unwind to that thread to ensure future panics in the storage thread are logged properly.

In snarkOS

For snarkOS, the plan is to use catch_unwind for important background tasks such as block synchronization. Again, this is to ensure that a panic in block synchronization does not happen silently and block synchronization stops for no apparent reason.

ProvableHQ/snarkOS#3874 contains the full set of proposed changes for snarkOS.

Notes

For panic handling, there is a potential overhead of storing backtraces that might not always be needed. It is my understanding that panics do not happen very frequently in production, but if this understanding is wrong, please let me know.
Panic handling for "detached" tokio task (those without JoinHandles) does not work well (yet). Future improvements to tokio will change this, but we also do not use detached tasks much.

utilities/src/errors.rs

utilities/src/task.rs

ledger/store/src/helpers/memory/internal/nested_map.rs

utilities/src/errors.rs

vicsn · 2025-09-21T16:37:24Z

and snarkVM may falsely log it as VM error.

Is this the main and only usecase for the thread local panic logic of this PR? I'm trying to better understand whether the refactor is worth it at the moment.

I thought we printed the error in the VM panic handler. Therefore:

if it were to catch a non-VM panic, I'd assume we would clearly observe this in the logs. Clumsy, but no harm done.
if it were to catch a VM panic of the wróng thread, would there be a risk of any of the following:
a. verification of the wrong transaction being logged as wrong?
b. verification of the wrong transaction being returned as Err?
c. the above being non-deterministic, and therefore creating a risk of a fork on the network?

Cargo.toml

ledger/store/src/helpers/rocksdb/internal/nested_map.rs

utilities/src/task.rs

kaimast · 2025-12-04T21:03:36Z

@vicsn I extended the PR description

raychu86 · 2025-12-10T22:42:06Z

utilities/src/lib.rs

+///
+/// It is more efficient to set the panic hook once and directly use `errors::try_vm_runtime`.
+#[macro_export]
+macro_rules! try_vm_runtime {


Don't all the finalize instances still call the macro try_vm_runtime! instead of errors::try_vm_runtime()?

This would call set_panic_hook multiple times rather than the one time you claim "Should be called exactly once."

I clarified the documentation that it is safe to call set_panic_hook multiple times. It will just not have any effect after the first invocation. The most recent commit also adds a boolean flag that is set once the handler is installed, to make sure there is no performance impact of calling set_panic_hook multiple times.

It would be great if we could replace that macro eventually, but that would require users of snarkVM to call set_panic_hook themselves.

…eaded setting

raychu86 · 2026-01-06T23:34:37Z

Logic looks reasonable, would be nice to see more unit tests to ensure the top level (non-try_vm_runtime panics) are handled correctly.

kaimast force-pushed the feat/track-error branch from 34a44e4 to cec41dd Compare September 17, 2025 20:13

kaimast changed the title ~~[Feature]~~ [Feature] Improved Panic Handling Sep 17, 2025

kaimast marked this pull request as ready for review September 17, 2025 20:47

raychu86 reviewed Sep 17, 2025

View reviewed changes

utilities/src/errors.rs Show resolved Hide resolved

raychu86 reviewed Sep 17, 2025

View reviewed changes

utilities/src/errors.rs Outdated Show resolved Hide resolved

vicsn reviewed Sep 18, 2025

View reviewed changes

utilities/src/task.rs Show resolved Hide resolved

vicsn reviewed Sep 18, 2025

View reviewed changes

ledger/store/src/helpers/memory/internal/nested_map.rs Show resolved Hide resolved

vicsn reviewed Sep 18, 2025

View reviewed changes

utilities/src/errors.rs Show resolved Hide resolved

kaimast mentioned this pull request Sep 19, 2025

[WIP] Track errors ProvableHQ/snarkOS#3874

Draft

vicsn requested a review from ljedrz September 22, 2025 19:14

ljedrz reviewed Sep 22, 2025

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

ljedrz reviewed Sep 22, 2025

View reviewed changes

ledger/store/src/helpers/rocksdb/internal/nested_map.rs Show resolved Hide resolved

ljedrz reviewed Sep 22, 2025

View reviewed changes

utilities/src/task.rs Show resolved Hide resolved

kaimast mentioned this pull request Oct 6, 2025

[Feature] LoggableError trait #2944

Merged

vicsn marked this pull request as draft October 8, 2025 17:01

kaimast force-pushed the feat/track-error branch from 1fc8711 to 5b52e96 Compare October 31, 2025 16:19

kaimast force-pushed the feat/track-error branch 2 times, most recently from 56de046 to ac0550c Compare November 13, 2025 23:27

kaimast force-pushed the feat/track-error branch from ac0550c to 6fc9ebb Compare November 27, 2025 02:08

kaimast marked this pull request as ready for review November 28, 2025 16:49

mitchmindtree mentioned this pull request Dec 5, 2025

[Bug] Panic on when using assert_eq on arrays ProvableHQ/leo#28992

Open

vicsn requested review from ljedrz and raychu86 December 5, 2025 09:42

raychu86 reviewed Dec 10, 2025

View reviewed changes

kaimast force-pushed the feat/track-error branch from 80c81d8 to 71e328b Compare December 11, 2025 18:39

kaimast added 3 commits January 6, 2026 11:30

feat(utilities): add utilities for propagating errors across async tasks

55a5889

feat(utils/error): make catching panics work correctly in a multi-thr…

938877d

…eaded setting

feat(synth/vm): catch panics in sequential ops thread

f104f5a

kaimast force-pushed the feat/track-error branch from 71e328b to f104f5a Compare January 6, 2026 19:30

pref(utils): only install panic handler once

b90ef1e

raychu86 approved these changes Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Improved Panic Handling#2927

[Feature] Improved Panic Handling#2927
kaimast wants to merge 4 commits intostagingfrom
feat/track-error

kaimast commented Sep 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vicsn commented Sep 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaimast commented Dec 4, 2025

Uh oh!

raychu86 Dec 10, 2025

Uh oh!

kaimast Jan 6, 2026

Uh oh!

raychu86 commented Jan 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kaimast commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context: Handling Fatal Errors in snarkVM

Improved Panic Handling Mechanisms

Utilities for Task Management

Usage

In snarkVM

In snarkOS

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vicsn commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaimast commented Dec 4, 2025

Uh oh!

raychu86 Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

kaimast Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

raychu86 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kaimast commented Sep 17, 2025 •

edited

Loading

vicsn commented Sep 21, 2025 •

edited

Loading

raychu86 commented Jan 6, 2026 •

edited

Loading