-
Notifications
You must be signed in to change notification settings - Fork 85
[RFC] Optimizing extraction of all unwind rows for an FDE #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…e_info_for_all_regs3 API function This greatly optimizes iteration through all FDE unwind rows. The new helper function invokes a callback for each row in the FDE, instead of having to repeatedly call _dwarf_exec_frame_instr, which performs a lot of redundant work per row and has quadratic behavior. The new helper is mostly a copy of _dwarf_exec_frame_instr with some minor modifications. The new dwarf_iterate_fde_info_for_all_regs3() function uses the new helper function internally
|
I have not looked at this yet, still working on harmless errors/dwarfdump. The problem with public structs in the API is that when DWARF changes something A function interface, on the other hand, even with N return values via pointers, Your code in this pull request looks pretty sensible. I need to look again, but it looks promising. |
|
Thanks! No rush -- happy to discuss when you have the room for it. |
Well, that assumes yet more change won't be required in the future. Anyway, the overall idea looks good. I'm about to write my own dwarf_iterate_fde_info_for_all_regs3() to verify various details. I will also write a new dwarfexample/frame2.c using the new approach. Maybe with You are not the first to request data on all the fde pc value rows, so this will help There are only three calls to _dwarf_exec_frame_instr (before changes) It would be nice to see your iterate versions as a cross-check on my efforts. |
|
Changing draft to pull request was a mistake on my part. So back to draft now. |
|
I have to refactor the 1600 lines of dwarf_exec_frame_instr() iterate.c But first: refactor. |
|
I moved _dwarf_exec_frame_instr() into a new source file: dwarf_cfa_read.c It's no smaller, but it's easier to work with (it already seems so). I think I see where and how to place the callbacks. More later. |
|
Hey Dave, Sorry for the late response, I've been away for a bit.
Totally understand the reasoning, but just to respond to the expensive part: in my profiles, about 25% of the remaining time in the new
Which
That seems like a sensible approach indeed. I considered doing that, but I didn't feel confident enough in my understanding of
Great! Let me know if I can provide any input; will respond more promptly now that I'm back in town :) |
|
I've been proceeding with your idea in a branch. It compiles but nothing more to report. It also occurred to me that having the Dwarf_Debug instance involved |
Hi @davea42,
For our use case, we're interested in extracting all unwind rows for a given binary. To do so, we're currently using the
dwarf_get_fde_info_for_all_regs3_bfunction by repeatedly calling it in a loop until all rows have been consumed. This looks something like this:For a binary we're testing against, this takes ~1 second to extract all unwind rows for all FDEs in the binary. The binary has:
We analyzed this, and the root cause is that this is essentially a somewhat hidden quadratic loop:
dwarf_get_fde_info_for_all_regs3_binternally uses_dwarf_exec_frame_instr(via_dwarf_get_fde_info_for_a_pc_row) to execute all instructions for the given FDE until it reaches thesearch_pc_valpassed in, but it starts from the first instruction for each call.This means that if you're iterating through the rows for an FDE like this, the loop essentially looks like this in pseudo code:
This PR implements a new function
dwarf_iterate_fd_info_for_all_regs3that fixes this. The new function uses a new (internal) helper function_dwarf_iterate_frame_instrthat executes all instructions for the FDE, invoking a callback for each row. This turns the quadratic loop into a linear loop over the instructions, which results in a significant speedup: iteration for this binary goes from 1007ms to 83ms, which is ~12x faster.Open questions
I'm sending this PR as a RFC/draft, because there are some questions around the change as implemented that I'm not sure about:
First of all, the new
_dwarf_iterate_frame_instrhelper is a copy/paste of the existing_dwarf_exec_frame_instrwith some minor modifications to invoke the callback instead. This results in significant code duplication that is probably not desired. It is technically possible to implement_dwarf_exec_frame_instrin terms of the new_dwarf_iterate_frame_instrto prevent this code duplication.I haven't included that as part of the PR, but that could look something like this:
But this is not a risk-free change.
Secondly, the new
dwarf_iterate_fde_info_for_all_regs3has to make use of an internal callback function_dwarf_iterate_fde_info_for_all_regs3_callbackthat's passed to the new_dwarf_iterate_frame_instrhelper. The reason for this is that_dwarf_iterate_frame_instr(and_dwarf_exec_frame_instr) work with an internal structDwarf_Framethat's not currently exposed inlibdwarf.h. This means we need to copy fromDwarf_FrametoDwarf_Regtable3, which is a bit wasteful.It would be nice if
Dwarf_Frame(and related structs) could be exposed in the API to avoid this copy step. As I understand it, the difference betweenDwarf_FrameandDwarf_Regtable3was introduced to avoid breaking the API for existing functions likedwarf_get_fde_info_for_all_regs3_b, but since there is a new function being introduced here, there is no risk of that.I realize this is quite a lot taken together, but it would be great to get your thoughts/feedback on all of this to see if we can get this PR in a shape where it could be upstreamed (or if you have any ideas around how a similar optimization could be implemented in a way that fits a bit better in libdwarf's current architecture).