Skip to content

Shopify omnibus: upstream main + JIT + FP unwinding + skip_native_resume#39

Draft
dalehamel wants to merge 22 commits intomainfrom
omnibus-rebase-plus-jit
Draft

Shopify omnibus: upstream main + JIT + FP unwinding + skip_native_resume#39
dalehamel wants to merge 22 commits intomainfrom
omnibus-rebase-plus-jit

Conversation

@dalehamel
Copy link
Copy Markdown
Member

Shopify internal deployment branch rebased on upstream main (6ab4187).

Includes:

Replaces shopify-omnibus-rebase-mar3. Static EC and DTV patches are now upstream so no longer needed here.

Note: BPF blobs need rebuilding with clang-17.

@dalehamel dalehamel force-pushed the omnibus-rebase-plus-jit branch 8 times, most recently from b828c8e to 06de4d7 Compare April 8, 2026 16:00
@dalehamel dalehamel changed the base branch from main to yjit-wip April 8, 2026 16:13
@dalehamel dalehamel changed the base branch from yjit-wip to main April 8, 2026 16:13
dalehamel added 17 commits April 8, 2026 12:20
… cleanup

- Extract JIT region detection into standalone findJITRegion() function
  that handles both prctl-labeled mappings (spanning full reserved area
  including holes) and heuristic fallback (first anonymous executable mapping)
- Add comprehensive unit tests for JIT region detection: no mappings,
  file-backed only, labeled single/multiple, heuristic fallback, precedence
- Fix Detach() to clean up PidInterpreterMapping prefixes (previously only
  deleted proc data, leaking eBPF map entries)
- Remove dead code: m.Vaddr < jitMapping.Vaddr branch that could never be
  true since /proc/pid/maps is sorted by address
- Fix JIT region check to use >= for lower bound (half-open interval [start, end))
  matching V8 and HotSpot conventions
- Reuse map lookup value to avoid double hashing of RawMapping struct key
- Wrap CalculatePrefixList error for better debugging
- Log stale prefix deletion errors instead of silently discarding
- Move prctl(PR_SET_VMA) named anonymous mapping support from process/process.go
  (was in PR #36 but is needed here for findJITRegion labeled detection)
- Remove stale 'assume all anonymous' comment
- Rebuild BPF blobs for >= fix
- Add 'Register LPM prefixes' comment to SynchronizeMappings loop
- Extract in_jit bool variable from inline if condition for clarity
- Improve JIT detection comments (read_ruby_frame and walk_ruby_stack)
- Fix whitespace: add blank line after FRAMES_PER_WALK_RUBY_STACK define,
  normalize MAX_EP_CHECKS spacing
- Rebuild BPF blobs
Ruby 3.4.7 with --yjit on arm64. Verifies that:
- JIT code region is detected from anonymous executable mappings
- A JIT frame is pushed for the YJIT-compiled code
- Ruby VM frames (is_prime, sum_of_primes, loop) are properly unwound
  after the JIT frame

On main, this coredump fails with native_no_pid_page_mapping because
the profiler doesn't know how to handle the anonymous JIT mapping.

Moduledata tar for CI upload: ~/coredump-uploads/ruby-yjit-moduledata.tar
Ruby 3.4.7 with --yjit on x86_64. Same verification as arm64 test:
- JIT code region detected from anonymous executable mappings
- JIT frame pushed for YJIT-compiled code
- Full Ruby VM stack unwound (is_prime, sum_of_primes, loop)

Moduledata tar: ~/coredump-uploads/ruby-yjit-amd64-moduledata.tar
On systems with CONFIG_ANON_VMA_NAME=y, Ruby labels its JIT memory
region via prctl(PR_SET_VMA), giving it a path like
'[anon:Ruby:rb_yjit_reserve_addr_space]'. Without this fix, IsFileBacked()
returns true for these mappings (non-empty Path), causing IsAnonymous()
to return false. This prevents the LPM prefix registration loop in
SynchronizeMappings from registering JIT executable pages for eBPF
dispatch.

Add IsPrctlNamed() helper and exclude prctl-named mappings from
IsFileBacked(), so they are correctly classified as anonymous.
Replace the jit_detected flag approach with V8-style frame pointer
unwinding through Ruby JIT frames. When YJIT emits frame pointers
(always on arm64, with --yjit-perf on x86_64), the Ruby eBPF unwinder
walks the native FP chain through JIT frames, pushes each as a
RUBY_FRAME_TYPE_JIT frame, then resolves the post-JIT mapping so native
unwinding can continue below the Ruby VM stack.

When frame pointers are not available, the original behavior is
preserved: a single dummy JIT frame is pushed, cfuncs are pushed inline,
and native unwinding is stopped at the end of the Ruby stack.

Also fixes parseMappings discarding prctl-labeled [anon:...] mappings,
which prevented the YJIT JIT region from being visible to interpreter
handlers.
…tion

Restructure the eBPF JIT frame pointer walk: instead of a separate loop
before the main CFP walk, advance the native FP chain by one frame per
main loop iteration. This keeps the native unwind state in lockstep with
the Ruby VM stack, supporting both YJIT (1 JIT frame, exits after first
iteration) and ZJIT (1 JIT frame per iseq, 1:1 with CFPs).

Fix JIT region detection in SynchronizeMappings to scan all mappings
(including non-executable ---p reservations) for the prctl-labeled JIT
region, then only register LPM prefixes for executable pages. This
ensures jit_start/jit_end cover the full reserved address range even
when the r-xp committed pages don't carry the label.

Also fix IsAnonymous() to recognize [anon:...] labeled mappings, and
remove debug log spam from parseMappings.
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed
  during rebase - PR #26 now reuses the shared function instead of inline code
- Remove redundant inline first-pass JIT detection (now handled by findJITRegion)
- Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Fix doc comment placement: move hasJitFramePointers comment to directly
  above the function definition instead of above findJITRegion
- Fix containerized process support: use /proc/PID/root/tmp/ to access
  container filesystem for perf map detection, and glob perf-*.map to
  handle namespace PID mismatch (Ruby uses ns-local PID in filename)
- Remove os import (replaced by path/filepath for Glob)
- Rebuild BPF blobs
When the Ruby unwinder encounters a cfunc (C-implemented Ruby method),
it normally transitions back to the native unwinder to unwind through
the C code, then re-enters the Ruby unwinder when it detects the Ruby
interpreter loop. Each such round-trip costs 2-3 tail calls out of a
budget of 29, so a Ruby stack with 10+ cfuncs can easily truncate.

This adds a skip_native_resume flag to RubyProcInfo that, when set,
pushes cfunc frames inline without transitioning to the native unwinder.
This trades native frames within cfuncs (e.g., the C call chain inside
Array#sort) for complete Ruby-level traces without truncation.

The flag is controlled by OTEL_EBPF_RUBY_SKIP_NATIVE_RESUME (default:
false). Set to 'true' to enable.
- Correct inline comment that incorrectly stated 'Default to skipping'
  when the actual default is false (requires explicit opt-in)
- Add note that skipNativeResume() is evaluated per-process at attach time
@dalehamel dalehamel force-pushed the omnibus-rebase-plus-jit branch from 06de4d7 to b439ffa Compare April 8, 2026 16:22
With frame pointer unwinding enabled (always on arm64), the JIT region
is walked via the FP chain, producing a richer stack with native frames
between Ruby VM frames resolved. The amd64 test is unchanged since
hasJitFramePointers returns false without --yjit-perf.
Only push the JIT frame when trace->num_frames == 0 (leaf position),
restoring the original behavior for both FP and non-FP paths.
walk_ruby_stack is re-entered via tail calls to process more frames.
On re-entry, in_jit was recomputed from record->state.pc which hasn't
changed (non-FP path), causing the JIT frame to be pushed again on
every tail call. Guard with !jit_detected so the JIT frame is only
pushed once on the first entry.
@dalehamel dalehamel force-pushed the omnibus-rebase-plus-jit branch from 4a90a15 to 477d66d Compare April 9, 2026 18:17
When skip_native_resume is true but jit_detected is false (PC not in JIT
region at sample time), the end-of-stack check fell through to
PROG_UNWIND_NATIVE. The native unwinder could then encounter the JIT
anonymous mapping and re-enter the Ruby unwinder, pushing a JIT frame
in the wrong stack position.

Check skip_native_resume alongside jit_detected at end-of-stack to
prevent resuming native unwinding when the flag is set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant