ruby: Handle Ruby JIT PC with JIT frame type#1102
ruby: Handle Ruby JIT PC with JIT frame type#1102dalehamel wants to merge 12 commits intoopen-telemetry:mainfrom
Conversation
|
Opening this one as a draft, it should probably have coredump tests as well. While it is independent from #1101 , it should probably land after it as it is lower a bit priority. If the approach looks good, I'll add some coredump tests. |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
I managed to add support for better jit unwinding support when frame pointers are present, I am thinking i will probably submit both together - this is more stub stuff |
7120077 to
e4d6d9a
Compare
korniltsev-grafanista
left a comment
There was a problem hiding this comment.
I've just tested the change locally and it worked well on 3.4.9 and 3.3.10
I left some comments, but the general idea works and unlocks the profiling with jit enabled
hey @korniltsev-grafanista i have an updated version of this branch Shopify#26 but i haven't updated this yet as i'm waiting for two other PRs to land first ( #1226 and #1227 ) Thanks for your comments, when the other PRs land trying to upstream the JIT support is my next priority. For now i can say that this branch is at least tested in production and works quite well for our cases so far but certainly can use some review and hardening. |
|
I took a look at Shopify#26 and it looks good to me. I would still prefer to only handle the whole jit area once and forget about it, than updating maps everytime something is recompiled, but it should work anyways. |
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
94930a6 to
5831ed4
Compare
|
@florianl I added new coredump tests for ruby with yjit enabled, can you please add the modules to the store |
|
@korniltsev-grafanista please take another look, i've added unit tests for the memory region detection and addressed your comments. I've also added coredump tests. I'm planning on submitting a second PR once this lands which will add proper frame pointer detection to keep the native state in sync, which would be needed for unwinding zjit correctly. But for now that's lower priority, this allows the ruby unwinder to work with either jit enabled at all. |
5831ed4 to
e151da4
Compare
| return 0, 0, false | ||
| } | ||
|
|
||
| func (r *rubyInstance) SynchronizeMappings(ebpf interpreter.EbpfHandler, |
There was a problem hiding this comment.
This was cargo-culted from the node interpreter
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
When trying to upload coredump tests for open-telemetry#1102 I did run into a bug. When the S3 API returned exactly batchSize items, the code would continue the loop without checking if there were more pages. If the ContinuationToken was nil (indicating end of results), passing it to the next ListObjectsV2 call would restart the listing from the beginning, creating an infinite loop. Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
| "Range#each+0 in <cfunc>:0", | ||
| "block in <main>+0 in /Users/dalehamel/src/github.com/Shopify/otel-ebpf-pr1102/tools/coredump/testsources/ruby/loop.rb:33", | ||
| "Kernel#loop+0 in <internal:kernel>:168", | ||
| "<main>+0 in /Users/dalehamel/src/github.com/Shopify/otel-ebpf-pr1102/tools/coredump/testsources/ruby/loop.rb:32" |
There was a problem hiding this comment.
Note that because we lack the ability to continue unwinding when there are no frame pointers, the 'base frame' appears as the base ruby frame, not actual C / native frame that is the entrypoint to the ruby interpreter process. This is something that my follow-up PR will address. but for now it is expected behaviour that once jit is detected, we don't resume native unwinding.
And even with the follow-up, it isn't guaranteed we can - ruby needs to be running with a jit-perf mode that enables the frame pointers. So this is generally an expected and (I think) acceptable tradeoff for being able to profile with jit enabled.
interpreter/ruby/ruby.go
Outdated
| maxSize atomic.Uint32 | ||
|
|
||
| // mappings is indexed by the Mapping to its generation | ||
| mappings map[process.RawMapping]*uint32 |
There was a problem hiding this comment.
I'm a bit worried about unlimited growth of this map. Is there some guidance on the number of entries we expect here?
… cleanup - Extract JIT region detection into standalone findJITRegion() function that handles both prctl-labeled mappings (spanning full reserved area including holes) and heuristic fallback (first anonymous executable mapping) - Add comprehensive unit tests for JIT region detection: no mappings, file-backed only, labeled single/multiple, heuristic fallback, precedence - Fix Detach() to clean up PidInterpreterMapping prefixes (previously only deleted proc data, leaking eBPF map entries) - Remove dead code: m.Vaddr < jitMapping.Vaddr branch that could never be true since /proc/pid/maps is sorted by address
- Fix JIT region check to use >= for lower bound (half-open interval [start, end)) matching V8 and HotSpot conventions - Reuse map lookup value to avoid double hashing of RawMapping struct key - Wrap CalculatePrefixList error for better debugging - Log stale prefix deletion errors instead of silently discarding - Move prctl(PR_SET_VMA) named anonymous mapping support from process/process.go (was in PR #36 but is needed here for findJITRegion labeled detection) - Remove stale 'assume all anonymous' comment - Rebuild BPF blobs for >= fix
- Add 'Register LPM prefixes' comment to SynchronizeMappings loop - Extract in_jit bool variable from inline if condition for clarity - Improve JIT detection comments (read_ruby_frame and walk_ruby_stack) - Fix whitespace: add blank line after FRAMES_PER_WALK_RUBY_STACK define, normalize MAX_EP_CHECKS spacing - Rebuild BPF blobs
Ruby 3.4.7 with --yjit on arm64. Verifies that: - JIT code region is detected from anonymous executable mappings - A JIT frame is pushed for the YJIT-compiled code - Ruby VM frames (is_prime, sum_of_primes, loop) are properly unwound after the JIT frame On main, this coredump fails with native_no_pid_page_mapping because the profiler doesn't know how to handle the anonymous JIT mapping. Moduledata tar for CI upload: ~/coredump-uploads/ruby-yjit-moduledata.tar
Ruby 3.4.7 with --yjit on x86_64. Same verification as arm64 test: - JIT code region detected from anonymous executable mappings - JIT frame pushed for YJIT-compiled code - Full Ruby VM stack unwound (is_prime, sum_of_primes, loop) Moduledata tar: ~/coredump-uploads/ruby-yjit-amd64-moduledata.tar
On systems with CONFIG_ANON_VMA_NAME=y, Ruby labels its JIT memory region via prctl(PR_SET_VMA), giving it a path like '[anon:Ruby:rb_yjit_reserve_addr_space]'. Without this fix, IsFileBacked() returns true for these mappings (non-empty Path), causing IsAnonymous() to return false. This prevents the LPM prefix registration loop in SynchronizeMappings from registering JIT executable pages for eBPF dispatch. Add IsPrctlNamed() helper and exclude prctl-named mappings from IsFileBacked(), so they are correctly classified as anonymous.
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Adapt SynchronizeMappings to work with uint32 values instead of *uint32 pointers: since generations are no longer shared via pointer, both mappings and their prefixes must be updated independently each cycle. Use an isNew flag to only call UpdatePidInterpreterMapping for newly discovered mappings.
0c685ac to
edbf402
Compare
|
thanks for the review and feedback @florianl I applied your suggestions in one commit (per the new guidance to ensure you get credit), and fixed them up in a second to fix the build and address the remaining comments. |
Address florianl's concern about unlimited growth: entries are pruned each SynchronizeMappings call via generation tracking, and the map size is bounded by the number of executable anonymous mappings per process.
|
|
||
| for _, prefix := range prefixes { | ||
| if isNew { | ||
| if err := ebpf.UpdatePidInterpreterMapping(pid, prefix, |
There was a problem hiding this comment.
My only tiny suggestion remains the same is to consider taking advantage of the fact that we know the full size of the yjit area (48/128/x mib) and we can just find the first mapping and assume the whole area belongs to the ruby interpreter without figuring out which subset of it has already been occupied/ garbage collected by the jit. This would simplify go code, there will be less map updates/deletitions every time something is recompiled, maps will be smaller.
| // Heuristic fallback: first anonymous executable mapping by address. | ||
| // Mappings from /proc/pid/maps are sorted by address, so the first | ||
| // match is the lowest address. | ||
| if !heuristicFound && m.IsExecutable() && m.IsAnonymous() { |
There was a problem hiding this comment.
Why does the label detection extends start/end jit area, but the heuristic sets it to the first found?
There was a problem hiding this comment.
There's also a bit of disconnection with the interpreter mappings: we UpdatePidInterpreterMapping for all mappings, but only set jit start/end for the first one. Is this intentional?
| wantFound: true, | ||
| }, | ||
| { | ||
| name: "heuristic fallback - first anonymous executable mapping", |
There was a problem hiding this comment.
Taking only the first mapping may be not correct.
I did not test, but looks like it would not catch at least this case.
7f17d99b9000-7f17d9b18000 r-xp 00000000 00:00 0
7f17d9b18000-7f17d9c31000 r-xp 00000000 00:00 0
7f17d9c31000-7f17e19b9000 ---p 00000000 00:00 0
Both of this mappings belong to the jit area. And there may be many more cases. There may be more mappings later. Furthermore once full jit area is occupied some of the pages may be grabage collected, so there may be holes in it.
There was a problem hiding this comment.
I don't understand why we reject the second mapping in the heuristic fallback - first anonymous executable mapping testcase.
execAnon(0x7f0000100000, 0x4000),
execAnon(0x7f0000200000, 0x8000),
What
Detect JIT frames produced by ruby's yjit / zjit to enable the ruby interpreter to still work if jit is in use.
Why
Fixes #937
yjit is a substantial improvement to performance, but currently breaks the ruby interpreter as it will never run, since the PC for the interpreter is never called - the whole point of JIT is to avoid the interpreter after all.
How
We use the
SynchronizeMappingshook to detect the ruby JIT address range, and if the PC is in this range we will push a dummy frame to indicate that the leaf is some JIT code.In the future, we can support the linux jit interface to symbolize these, but for now we just push a dummy frame.
Since Ruby's jit works by just running native code replacing the ISEQ of the leaf CME, we can just switch to unwinding the ruby stack once we detect a JIT frame.
However, since we cannot be guaranteed base pointers are available (and even if they are, we don't seem to be able to further unwind correctly with the native unwinder), we can't switch back to native unwinding once we have detected a JIT frame on the stack. This means that if JIT is enabled, we don't get the "interleaved" native and ruby stacks anymore, but we do still get the native frames on the edge of the stack which are probably the most interesting:
In a follow-up, PR, i'll add more thorough support for continuing to unwind when we have frame pointers enabled, and better support for zjit which is a method-based jit.