-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
This is a tracking issue. In #11964, memory watchpoints are the last remaining "basic debugging primitive". I'd like to push this out of the MVP feature-set as I would like to get to a working end-to-end debugging demo sooner rather than later. This issue tracks the post-MVP work, which is still planned.
I suspect the most straightforward approach will be:
- Augment
VMMemoryDefinitionto have a nullable pointer to a watchpoint shadow memory. - When debugging is enabled in an Engine config, we allocate and update (grow and shrink) a memory region the same size as the memory itself for the shadow. To facilitate this it may be worthwhile to disallow custom memory creators and disallow use of the pooling allocator (this should be fine for a basic debugging configuration on a developer's machine); those could be supported again later if needed.
- This shadow is all-zeroes by default. Setting a watchpoint sets the byte(s) at the relevant addresses to nonzero values. These bytes are effectively 1-bit flags per address. (Why full bytes and not a bit-vector compressed by 8x? I suspect the shifting and masking logic would have more cost in the common case, and in any case would blow up code.)
- Add a separate
Configmethod (beyondguest_debug) that enables watchpoints, because they will have additional overhead. - At the Wasm-to-CLIF translator level, when watchpoints are enabled:
- For every load, emit a load of the same size to the same offset in the shadow memory. Do this load second, so it doesn't need to carry trap metadata (actual load will trap). On result of shadow load, do a
brifaround a hostcall (withPreserveAllABI) to awatchpoint_loadhostcall with the memory ID, address, size, and data loaded. - For every store, emit a load of the same size to the same offset in the original memory to force a trap before updating data (see also Cranelift: implement "precise store traps" in presence of store-tearing hardware. #8221 where I prototyped this for another purpose and measured it to have 2% performance impact). Then do the shadow load and
brif, then the store. In the watchpoint case, make awatchpoint_storehostcall with the memory ID, address, size, old data, and new data.
- For every load, emit a load of the same size to the same offset in the shadow memory. Do this load second, so it doesn't need to carry trap metadata (actual load will trap). On result of shadow load, do a
- Emit new
DebugEvents for watchpoint hits. - Add an API to
Memoryto enable/disable watchpoints when a shadow is present. - Perhaps address host memory accesses: add an API to trigger watchpoints when accessing a range on a
Memory. This would lead to serious complexity if added to the basic synchronous slice-returning accessors because the event callback is async, but perhaps we could have a separate async "trigger watchpoints on read/write" method, and at least use that at the appropriate places in the component-model lowering.
There are some tradeoffs inherent in the above, and I suspect the above will be a good balance between (i) code size, (ii) complexity, and (iii) performance, but we can experiment with any other ideas that seem promising as well. I should list a few approaches rejected (for now) and my reasoning for completeness:
- More complex "sparse" data structures for watchpoints: likely to have too much code-size and runtime performance impact. E.g., we don't want to do a multi-level trie lookup for every load/store. That will have performance approaching "softmmu"-style page table emulation (which also does a trie lookup), which is something like 10x. A bitmap (one bit per byte in original memory) will be less but still likely too much. Consider also register pressure due to computing address after shifting, versus using the same addressing mode (with different base) in real and shadow loads.
- Hostcall for every load/store: likely to have far too much overhead to be practical.
- Leveraging traps and virtual memory protection: goes against what we discovered in attempts to use traps for breakpoints -- there are very serious complexity and implementation concerns here, leading to Debug: plan for simple libcall/instrumentation-based MVP #11964's approach instead. Also adds complexity when granting access via the host API.
Metadata
Metadata
Assignees
Labels
No labels