Shopify omnibus: upstream main + JIT + FP unwinding + skip_native_resume#39
Draft
Shopify omnibus: upstream main + JIT + FP unwinding + skip_native_resume#39
Conversation
b828c8e to
06de4d7
Compare
… cleanup - Extract JIT region detection into standalone findJITRegion() function that handles both prctl-labeled mappings (spanning full reserved area including holes) and heuristic fallback (first anonymous executable mapping) - Add comprehensive unit tests for JIT region detection: no mappings, file-backed only, labeled single/multiple, heuristic fallback, precedence - Fix Detach() to clean up PidInterpreterMapping prefixes (previously only deleted proc data, leaking eBPF map entries) - Remove dead code: m.Vaddr < jitMapping.Vaddr branch that could never be true since /proc/pid/maps is sorted by address
- Fix JIT region check to use >= for lower bound (half-open interval [start, end)) matching V8 and HotSpot conventions - Reuse map lookup value to avoid double hashing of RawMapping struct key - Wrap CalculatePrefixList error for better debugging - Log stale prefix deletion errors instead of silently discarding - Move prctl(PR_SET_VMA) named anonymous mapping support from process/process.go (was in PR #36 but is needed here for findJITRegion labeled detection) - Remove stale 'assume all anonymous' comment - Rebuild BPF blobs for >= fix
- Add 'Register LPM prefixes' comment to SynchronizeMappings loop - Extract in_jit bool variable from inline if condition for clarity - Improve JIT detection comments (read_ruby_frame and walk_ruby_stack) - Fix whitespace: add blank line after FRAMES_PER_WALK_RUBY_STACK define, normalize MAX_EP_CHECKS spacing - Rebuild BPF blobs
Ruby 3.4.7 with --yjit on arm64. Verifies that: - JIT code region is detected from anonymous executable mappings - A JIT frame is pushed for the YJIT-compiled code - Ruby VM frames (is_prime, sum_of_primes, loop) are properly unwound after the JIT frame On main, this coredump fails with native_no_pid_page_mapping because the profiler doesn't know how to handle the anonymous JIT mapping. Moduledata tar for CI upload: ~/coredump-uploads/ruby-yjit-moduledata.tar
Ruby 3.4.7 with --yjit on x86_64. Same verification as arm64 test: - JIT code region detected from anonymous executable mappings - JIT frame pushed for YJIT-compiled code - Full Ruby VM stack unwound (is_prime, sum_of_primes, loop) Moduledata tar: ~/coredump-uploads/ruby-yjit-amd64-moduledata.tar
On systems with CONFIG_ANON_VMA_NAME=y, Ruby labels its JIT memory region via prctl(PR_SET_VMA), giving it a path like '[anon:Ruby:rb_yjit_reserve_addr_space]'. Without this fix, IsFileBacked() returns true for these mappings (non-empty Path), causing IsAnonymous() to return false. This prevents the LPM prefix registration loop in SynchronizeMappings from registering JIT executable pages for eBPF dispatch. Add IsPrctlNamed() helper and exclude prctl-named mappings from IsFileBacked(), so they are correctly classified as anonymous.
Replace the jit_detected flag approach with V8-style frame pointer unwinding through Ruby JIT frames. When YJIT emits frame pointers (always on arm64, with --yjit-perf on x86_64), the Ruby eBPF unwinder walks the native FP chain through JIT frames, pushes each as a RUBY_FRAME_TYPE_JIT frame, then resolves the post-JIT mapping so native unwinding can continue below the Ruby VM stack. When frame pointers are not available, the original behavior is preserved: a single dummy JIT frame is pushed, cfuncs are pushed inline, and native unwinding is stopped at the end of the Ruby stack. Also fixes parseMappings discarding prctl-labeled [anon:...] mappings, which prevented the YJIT JIT region from being visible to interpreter handlers.
…tion Restructure the eBPF JIT frame pointer walk: instead of a separate loop before the main CFP walk, advance the native FP chain by one frame per main loop iteration. This keeps the native unwind state in lockstep with the Ruby VM stack, supporting both YJIT (1 JIT frame, exits after first iteration) and ZJIT (1 JIT frame per iseq, 1:1 with CFPs). Fix JIT region detection in SynchronizeMappings to scan all mappings (including non-executable ---p reservations) for the prctl-labeled JIT region, then only register LPM prefixes for executable pages. This ensures jit_start/jit_end cover the full reserved address range even when the r-xp committed pages don't carry the label. Also fix IsAnonymous() to recognize [anon:...] labeled mappings, and remove debug log spam from parseMappings.
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Fix doc comment placement: move hasJitFramePointers comment to directly above the function definition instead of above findJITRegion - Fix containerized process support: use /proc/PID/root/tmp/ to access container filesystem for perf map detection, and glob perf-*.map to handle namespace PID mismatch (Ruby uses ns-local PID in filename) - Remove os import (replaced by path/filepath for Glob) - Rebuild BPF blobs
When the Ruby unwinder encounters a cfunc (C-implemented Ruby method), it normally transitions back to the native unwinder to unwind through the C code, then re-enters the Ruby unwinder when it detects the Ruby interpreter loop. Each such round-trip costs 2-3 tail calls out of a budget of 29, so a Ruby stack with 10+ cfuncs can easily truncate. This adds a skip_native_resume flag to RubyProcInfo that, when set, pushes cfunc frames inline without transitioning to the native unwinder. This trades native frames within cfuncs (e.g., the C call chain inside Array#sort) for complete Ruby-level traces without truncation. The flag is controlled by OTEL_EBPF_RUBY_SKIP_NATIVE_RESUME (default: false). Set to 'true' to enable.
- Correct inline comment that incorrectly stated 'Default to skipping' when the actual default is false (requires explicit opt-in) - Add note that skipNativeResume() is evaluated per-process at attach time
06de4d7 to
b439ffa
Compare
With frame pointer unwinding enabled (always on arm64), the JIT region is walked via the FP chain, producing a richer stack with native frames between Ruby VM frames resolved. The amd64 test is unchanged since hasJitFramePointers returns false without --yjit-perf.
Only push the JIT frame when trace->num_frames == 0 (leaf position), restoring the original behavior for both FP and non-FP paths.
This reverts commit 4332554.
walk_ruby_stack is re-entered via tail calls to process more frames. On re-entry, in_jit was recomputed from record->state.pc which hasn't changed (non-FP path), causing the JIT frame to be pushed again on every tail call. Guard with !jit_detected so the JIT frame is only pushed once on the first entry.
4a90a15 to
477d66d
Compare
When skip_native_resume is true but jit_detected is false (PC not in JIT region at sample time), the end-of-stack check fell through to PROG_UNWIND_NATIVE. The native unwinder could then encounter the JIT anonymous mapping and re-enter the Ruby unwinder, pushing a JIT frame in the wrong stack position. Check skip_native_resume alongside jit_detected at end-of-stack to prevent resuming native unwinding when the flag is set.
c8bb4af to
ffdbe91
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Shopify internal deployment branch rebased on upstream main (6ab4187).
Includes:
Replaces
shopify-omnibus-rebase-mar3. Static EC and DTV patches are now upstream so no longer needed here.Note: BPF blobs need rebuilding with clang-17.