Skip to content

Commit a7876d0

Browse files
committed
feat: Enable sibling callee discovery and nested struct synthesis
This commit significantly expands the scope of structure reconstruction by improving how related functions are discovered and how complex data layouts are represented. The primary change addresses a limitation in `CrossFunctionAnalyzer` where analysis starting from a callee would find the caller but fail to discover *other* callees of that same caller ("siblings"). Key changes by component: docs/CROSS_FUNCTION_SIBLING_DISCOVERY.md (added): - Added documentation explaining the sibling discovery logic and its impact on struct reconstruction coverage. include/structor/cross_function_analyzer.hpp, src/cross_function_analyzer.cpp: - Updated `trace_backward` to trigger `trace_forward` from discovered callers, ensuring sibling callees are analyzed. - Added `CalleeCallInfo` and `CallerCallInfo` structs to carry rich metadata (delta, by-ref status, function pointer types) instead of simple tuples. - Implemented `resolve_indirect_callees` to trace flow through function pointers when `include_indirect_calls` is enabled. - Added `by_ref` tracking to correctly handle pointer indirection levels when variables are passed by reference. include/structor/layout_synthesizer.hpp: - Added `detect_subobjects` to identify clusters of fields that should be extracted into nested structures. - Added `apply_bitfield_recovery` to convert bitwise access patterns into formal bitfield members. - Added `emit_substructs` configuration option. include/structor/structure_persistence.hpp: - Added `find_reuse_candidate` to search the IDB for existing structures that match the synthesized layout (based on field offsets and semantics) to prevent duplicate type creation. - Implemented `create_struct_with_substructs` to handle the recursive creation of nested types. - Added logic to persist bitfield members to IDA. include/structor/access_collector.hpp: - Updated `AccessPatternVisitor` to detect bitwise AND/SHIFT operations for bitfield inference. - Added `stride_hint` extraction from array indexing to aid the Z3 solver in detecting array patterns. include/structor/z3/array_constraints.cpp: - Updated array detection to utilize `stride_hint` from the access collector, improving accuracy for arrays of structures. Impact: - Struct reconstruction is now much more complete; analyzing a leaf function will now pull in data from all functions that share the same structure instance via a common caller. - Generated structures are more semantic: bitfields are correctly typed, and distinct field groups are extracted into nested sub-structures. - Reduced IDB clutter: The plugin now attempts to reuse existing compatible structs rather than always creating new ones.
1 parent f9c91f1 commit a7876d0

17 files changed

+1695
-131
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Cross-Function Sibling Callee Discovery
2+
3+
## Overview
4+
5+
This document describes the fix for cross-function sibling callee discovery in struct reconstruction.
6+
7+
## The Issue
8+
9+
Prior to the fix, `CrossFunctionAnalyzer::trace_backward` did NOT call `trace_forward` from discovered callers to find sibling callees. This meant that when starting analysis from one callee, other callees of the same caller were NOT discovered.
10+
11+
### Example Scenario
12+
13+
Consider a call graph:
14+
15+
```
16+
main()
17+
├── traverse_list(n1) <- Start analysis here
18+
├── sum_list(n1) <- Sibling callee (same struct)
19+
└── insert_after(n1, n2) <- Sibling callee (same struct)
20+
```
21+
22+
**Before fix:**
23+
- Start from `traverse_list`
24+
- Trace backward → find `main`
25+
- Collect `main`'s pattern
26+
- **STOP** - siblings `sum_list` and `insert_after` NOT discovered!
27+
28+
**After fix:**
29+
- Start from `traverse_list`
30+
- Trace backward → find `main`
31+
- Collect `main`'s pattern
32+
- **Trace forward from `main`** → discover `sum_list` and `insert_after`
33+
- Collect patterns from ALL siblings
34+
35+
## The Fix
36+
37+
Added to `src/cross_function_analyzer.cpp` in `trace_backward()`:
38+
39+
```cpp
40+
// Recurse backward
41+
trace_backward(caller_ea, caller_var_idx, cumulative_delta, current_depth + 1, synth_opts);
42+
43+
// IMPORTANT: Also trace forward from the caller to discover sibling callees.
44+
// This ensures that if main() calls both traverse_list() and sum_list()
45+
// with the same struct, we collect access patterns from all siblings.
46+
if (config_.follow_forward) {
47+
trace_forward(caller_ea, caller_var_idx, cumulative_delta, current_depth + 1, synth_opts);
48+
}
49+
```
50+
51+
This mirrors the behavior already present in `TypePropagator::propagate_backward` (which correctly handled sibling discovery during type propagation, but not during constraint collection).
52+
53+
## Verification
54+
55+
### Unit Test: `test_linked_list_sibling_discovery`
56+
57+
Added test in `test/test_cross_function.cpp` that simulates the linked list scenario:
58+
59+
```
60+
main() at 0x4000 calls:
61+
- traverse_list() at 0x1000: accesses offset 0x00 (next), 0x10 (data)
62+
- sum_list() at 0x2000: accesses offset 0x00 (next), 0x10 (data)
63+
- insert_after() at 0x3000: accesses offset 0x00 (next), 0x08 (prev)
64+
65+
Expected struct (when starting from traverse_list):
66+
- offset 0x00: pointer (next) - from all three functions
67+
- offset 0x08: pointer (prev) - ONLY from insert_after ← Critical!
68+
- offset 0x10: int (data) - from traverse_list and sum_list
69+
```
70+
71+
The test verifies that offset 0x08 is discovered when starting from `traverse_list`, proving sibling discovery works.
72+
73+
### Test Results
74+
75+
```
76+
=== Cross-Function Analysis Unit Tests ===
77+
78+
[PASS] linked_list_sibling_discovery (0ms)
79+
...
80+
Passed: 15, Failed: 0
81+
```
82+
83+
## Impact on Struct Reconstruction
84+
85+
With this fix, struct reconstruction now properly considers ALL xref callees:
86+
87+
| Function analyzed from | Functions included in analysis |
88+
|------------------------|-------------------------------|
89+
| traverse_list | traverse_list + main + sum_list + insert_after |
90+
| sum_list | sum_list + main + traverse_list + insert_after |
91+
| insert_after | insert_after + main + traverse_list + sum_list |
92+
93+
This results in a more complete struct with fields from ALL related functions.
94+
95+
## Note on the test_linked_list Binary
96+
97+
In the actual `test_linked_list` binary, `insert_after` is NOT called from `main()`, so it won't be discovered as a sibling. This is correct behavior - you can only discover siblings that are actually in the call graph.
98+
99+
The synthesized struct `synth_struct_10000052C_0` correctly contains:
100+
- `field_0` at offset 0 (next pointer) - from traverse_list/sum_list
101+
- `func_10` at offset 0x10 (data) - from traverse_list/sum_list
102+
103+
If `insert_after` were called from `main()`, the struct would also include:
104+
- `field_8` at offset 8 (prev pointer) - from insert_after

docs/Z3_SYNTHESIS_PLAN.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -701,9 +701,8 @@ private:
701701
);
702702
703703
/// Find all callees where var is passed as an argument
704-
/// Returns: vector of (callee_ea, param_idx, delta) tuples
705-
/// delta is the constant offset added to var before passing
706-
[[nodiscard]] qvector<std::tuple<ea_t, int, sval_t>> find_callees_with_arg(
704+
/// Returns call infos with callee, delta, and by-ref metadata
705+
[[nodiscard]] qvector<CalleeCallInfo> find_callees_with_arg(
707706
cfunc_t* cfunc,
708707
int var_idx
709708
);
@@ -716,7 +715,8 @@ private:
716715
);
717716
718717
/// Find all callers that pass a value to this function's parameter
719-
[[nodiscard]] qvector<std::pair<ea_t, int>> find_callers_with_param(
718+
/// Returns call infos with caller var, delta, and by-ref metadata
719+
[[nodiscard]] qvector<CallerCallInfo> find_callers_with_param(
720720
ea_t func_ea,
721721
int param_idx
722722
);

0 commit comments

Comments
 (0)