Skip to content

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Oct 1, 2025

This PR implements ideas from the recent RFC to serve as the basis
for Wasm (guest) debugging: it adds a stackslot to each function
translated from Wasm, stores to replicate Wasm VM state in the
stackslot as the program runs, and metadata to describe the format of
that state and allow reading it out at runtime.

As an initial user of this state, this PR adds a basic "stack view"
API that, from host code that has been called from Wasm, can examine
Wasm frames currently on the stack and read out all of their locals
and stack slots.

Note in particular that this PR does not include breakpoints,
watchpoints, stepped execution, or any sort of user interface for any
of this; it is only a foundation.

This PR renames the existing debug option to native_debug, to
distinguish it from the new approach.

(Stacked on #11768, and depends on #11784.)

@cfallin cfallin requested review from a team as code owners October 1, 2025 03:37
@cfallin cfallin requested review from fitzgen and removed request for a team October 1, 2025 03:37
@cfallin cfallin marked this pull request as draft October 1, 2025 03:37
@cfallin
Copy link
Member Author

cfallin commented Oct 1, 2025

(Marking this as a draft until I resolve the two issues above; posting to show how the Cranelift side is used in practice, in case that's useful.)

@cfallin cfallin force-pushed the wasmtime-debug-instrumentation branch 12 times, most recently from 67c060e to b5c5804 Compare October 1, 2025 06:01
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Oct 1, 2025
@cfallin cfallin force-pushed the wasmtime-debug-instrumentation branch from b5c5804 to 2c75c5d Compare October 2, 2025 22:39
@github-actions github-actions bot added the cranelift:meta Everything related to the meta-language. label Oct 3, 2025
@cfallin cfallin force-pushed the wasmtime-debug-instrumentation branch from 2b2419a to 7426eda Compare October 3, 2025 04:37
@cfallin cfallin marked this pull request as ready for review October 3, 2025 05:03
@cfallin
Copy link
Member Author

cfallin commented Oct 3, 2025

Alright, I've rebased this on the latest #11768, and updated it to take an iterator-based approach instead. The borrowing is a little tricky but does work out: we implement a custom iterator up the stack frames (StackView), which owns the store while alive, and a mutable borrow of that iterator must be passed into methods on the objects it returns (FrameView) to actually read values. All of that gives a safe, lazy (O(1)) way of walking the stack.

In #11783 (underlying iterator that this refactor used) I chose not to replicate all of the complex logic to walk all activations and continuation stack-chains; for now, this API walks only the last activation that entered the host. That seems sufficient for a basic debugger application to me, but I can update it as desired.

This is now stacked on #11768 and #11783, and depends on the just-filed #11784 to be resolved somehow.

cfallin added 12 commits October 3, 2025 13:11
This PR adds *debug tags*, a kind of metadata that can attach to CLIF
instructions and be lowered to VCode instructions and as metadata on
the produced compiled code. It also adds opaque descriptor blobs
carried with stackslots. Together, these two features allow decorating
IR with first-class debug instrumentation that is properly preserved
by the compiler, including across optimizations and
inlining. (Wasmtime's use of these features will come in followup
PRs.)

The key idea of a "debug tag" is to allow the Cranelift embedder to
express whatever information it needs to, in a format that is opaque
to Cranelift itself, except for the parts that need translation during
lowering. In particular, the `DebugTag::StackSlot` variant gets
translated to a physical offset into the stackframe in the compiled
metadata output. So, for example, the embedder can emit a tag
referring to a stackslot, and another describing an offset in that
stackslot.

The debug tags exist as a *sequence* on any given instruction; the
meaning of the sequence is known only to the embedder, *except* that
during inlining, the tags for the inlining call instruction are
prepended to the tags of inlined instructions. In this way, a
canonical use-case of tags as describing original source-language
frames can preserve the source-language view even when multiple
functions are inlined into one.

The descriptor on a stackslot may look a little odd at first, but its
purpose is to allow serializing some description of
stackslot-contained runtime user-program data, in a way that is firmly
attached to the stackslot. In particular, in the face of inlining,
this descriptor is copied into the inlining (parent) function from the
inlined function when the stackslot entity is copied; no other
metadata outside Cranelift needs to track the identity of stackslots
and know about that motion. This fits nicely with the ability of tags
to refer to stackslots; together, the embedder can annotate
instructions as having certain state in stackslots, and describe the
format of that state per stackslot.

This infrastructure is tested with some compile-tests now;
testing of the interpretation of the metadata output will come with
end-to-end debug instrumentation tests in a followup PR.
…ne runtime state.

This PR implements ideas from the [recent RFC] to serve as the basis
for Wasm (guest) debugging: it adds a stackslot to each function
translated from Wasm, stores to replicate Wasm VM state in the
stackslot as the program runs, and metadata to describe the format of
that state and allow reading it out at runtime.

As an initial user of this state, this PR adds a basic "stack view"
API that, from host code that has been called from Wasm, can examine
Wasm frames currently on the stack and read out all of their locals
and stack slots.

Note in particular that this PR does not include breakpoints,
watchpoints, stepped execution, or any sort of user interface for any
of this; it is only a foundation.

This PR still has a few unsatisfying bits that I intend to address:

- The "stack view" performs some O(n) work when the view is initially
  taken, computing some internal data per frame. This is forced by the
  current design of `Backtrace`, which takes a closure and walks that
  closure over stack frames eagerly (rather than work as an
  iterator). It's got some impressive iterator-chain stuff going on
  internally, so refactoring it to the latter approach might not
  be *too* bad, but I haven't tackled it yet.

  A O(1) stack view, that is, one that does work only for frames as
  the host API is used to walk up the stack, is desirable because some
  use-cases may want to quickly examine e.g. only the deepest
  frame (say, running with a breakpoint condition that needs to read a
  particular local's value after each step).

- It includes a new `Config::compiler_force_inlining()` option that is
  used only for testing that we get the correct frames after
  inlining. I couldn't get the existing flags to work on a Wasmtime
  config level and suspect there may be an existing bug there; I will
  try to split out a fix for it.

This PR renames the existing `debug` option to `native_debug`, to
distinguish it from the new approach.

[recent RFC]: bytecodealliance/rfcs#44
@cfallin cfallin force-pushed the wasmtime-debug-instrumentation branch from a81db1a to bdc71e0 Compare October 3, 2025 20:12
@cfallin
Copy link
Member Author

cfallin commented Oct 3, 2025

Rebased on #11783 now that it's landed.

Also a few more refined thoughts on

I chose not to replicate all of the complex logic to walk all activations and continuation stack-chains; for now, this API walks only the last activation that entered the host. That seems sufficient for a basic debugger application to me, but I can update it as desired.

I am realizing in building the next step that we probably do want a "debug session" to apply only to one Wasm activation -- if for no other reason than that the API is much more semantically clear when exits back to the host are atomic events rather than somehow "translucent" (host code invisible but nested Wasm invocations visible). And there are weird corner cases where, for example, the recursively invoked host code itself tries to set up a (nested) debug context that we'd want to rule out anyway. So: the limit to introspect only the most recent activation is a feature, not a bug!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant