decomp: Add CFG, Dominance, and Loop Analysis for AST structuring#139
decomp: Add CFG, Dominance, and Loop Analysis for AST structuring#139
Conversation
6fe46e3 to
24fba92
Compare
3fb4be9 to
be02750
Compare
2e84a8b to
c07e2f3
Compare
There was a problem hiding this comment.
Pull request overview
Introduces a new structured AST normalization/structuring path for patchir-decomp, adding CFG extraction + Ghidra-style CollapseStructure + re-emission back into Clang AST, along with various type/operation handling tweaks and updated lit JSON tests for switch/loop structuring.
Changes:
- Add new AST structuring infrastructure:
SNodeIR, CFG builder + dominator/loop utilities,CollapseStructure, and a Clang AST emitter with pretty-print cleanup. - Update decompiler operation/type handling (e.g., switch cases metadata, string-literal LOAD handling, STORE deref simplification, PIECE/SUBPIECE adjustments).
- Add CLI flags and extend/adjust lit tests for switch/loop structuring scenarios.
Reviewed changes
Copilot reviewed 39 out of 41 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/patchir-decomp/main.cpp | Adds CLI flags for structuring/goto-elimination and tightens codegen execution on diagnostics. |
| tools/CMakeLists.txt | Fixes indentation/formatting for tool subdirectory inclusion. |
| test/patchir-decomp/switch_successor_blocks.json | Updates switch test input/expectations for explicit switch case metadata and parameter usage. |
| test/patchir-decomp/switch_missing_target_blocks.json | Adds/updates a test for missing switch target blocks and uses the new structuring flag in RUN lines. |
| test/patchir-decomp/switch_cases_switch_input_discriminant.json | Updates discriminant handling expectations to use parameter-based selector and explicit cases. |
| test/patchir-decomp/ghidra_struct_while.json | New test covering while structuring via Ghidra-style pipeline. |
| test/patchir-decomp/ghidra_struct_switch.json | New test covering switch structuring via Ghidra-style pipeline (print-tu only). |
| test/patchir-decomp/ghidra_struct_sequential.json | New test for sequential (non-loop/goto) structuring behavior. |
| test/patchir-decomp/ghidra_struct_nested_loop.json | New test for nested loop structuring behavior. |
| test/patchir-decomp/ghidra_struct_irreducible.json | New test for irreducible flow fallback behavior. |
| test/patchir-decomp/ghidra_struct_if_else.json | New test for if/else structuring behavior. |
| test/patchir-decomp/ghidra_struct_dowhile.json | New test for do/while structuring behavior. |
| test/lit.cfg.py | Minor formatting change. |
| lib/patchestry/Passes/Utils.cpp | Renames local constants to k... style for bit-width calculations. |
| lib/patchestry/Ghidra/JsonDeserialize.cpp | Refactors bitfield/string deserialization and include ordering; adds warning on missing bitfield base type. |
| lib/patchestry/Dialect/Pcode/Deserialize.cpp | Renames deserializer to Deserializer for consistency. |
| lib/patchestry/AST/TypeBuilder.cpp | Updates to new TypeBuilder constant/member naming and error-message text. |
| lib/patchestry/AST/SNodeDebug.cpp | New DOT emitter for visualizing SNode trees. |
| lib/patchestry/AST/SNode.cpp | New SNode dump implementation utilities. |
| lib/patchestry/AST/OperationStmt.cpp | Updates switch lowering, LOAD/STORE simplifications, call signature reconstruction, PIECE/SUBPIECE tweaks. |
| lib/patchestry/AST/OperationBuilder.cpp | Adjusts operation-stmt caching/materialization logic and TypeBuilder API rename usage. |
| lib/patchestry/AST/LoopInfo.cpp | New natural loop detection utilities. |
| lib/patchestry/AST/FunctionBuilder.cpp | Updates for TypeBuilder API rename and documents calling convention limitations. |
| lib/patchestry/AST/DomTree.cpp | New dominator/postdominator tree implementation. |
| lib/patchestry/AST/ClangEmitter.cpp | New conversion from SNode back to Clang AST + cleanup passes (label/goto/decl fixups). |
| lib/patchestry/AST/CfgBuilder.cpp | New CFG builder from Clang function bodies and switch metadata population from Ghidra JSON. |
| lib/patchestry/AST/CMakeLists.txt | Adds new AST structuring sources to the build. |
| lib/patchestry/AST/ASTConsumer.cpp | Integrates CFG+CollapseStructure+emitter pipeline into the consumer flow. |
| include/patchestry/Util/Options.hpp | Adds goto-elimination-related fields to global options. |
| include/patchestry/Ghidra/PcodeTypes.hpp | Encapsulates BitField base type member and clarifies StringType comment. |
| include/patchestry/Ghidra/Pcode.hpp | Renames mnemonic mapper constants to k... style. |
| include/patchestry/Dialect/Pcode/Deserialize.hpp | Renames deserializer to Deserializer in the public header. |
| include/patchestry/AST/TypeBuilder.hpp | Renames constants/methods to k... / Get... / AstContext. |
| include/patchestry/AST/SNodeDebug.hpp | Declares DOT emitter API for SNode. |
| include/patchestry/AST/SNode.hpp | New SNode IR definition and arena factory. |
| include/patchestry/AST/LoopInfo.hpp | Declares loop detection data structures/APIs. |
| include/patchestry/AST/DomTree.hpp | Declares dominator tree API. |
| include/patchestry/AST/CollapseStructure.hpp | Declares Ghidra-style structuring algorithm + internal graph types. |
| include/patchestry/AST/ClangEmitter.hpp | Declares SNode→Clang emission and pretty-print cleanup APIs. |
| include/patchestry/AST/CfgBuilder.hpp | Declares CFG structures/build + switch metadata population API. |
Comments suppressed due to low confidence (1)
lib/patchestry/AST/OperationStmt.cpp:1658
- In
create_piece,low_widthcan remain 0 if the serialized type lookup fails forop.inputs[1].type_key, which makes the shift a no-op and produces an incorrect PIECE result (high and low parts overlap). Add a fallback to derive the bit-width frominput1_expr->getType()(or another reliable source) when the type_key lookup fails.
// Determine low-part bit width from input1's type. The Varnode::size
// field is not populated for operation inputs, so look up the type.
unsigned low_width = 0;
{
auto low_type_it =
type_builder().GetSerializedTypes().find(op.inputs[1].type_key);
if (low_type_it != type_builder().GetSerializedTypes().end()) {
low_width = static_cast< unsigned >(ctx.getTypeSize(low_type_it->second));
}
}
auto merge_to_next = !op.output.has_value();
// If Operation has type, convert expression to operation type and perform bit-shift and
// or operation.
if (op.type) {
auto op_type = type_it->second;
auto *cast_expr_input0 = make_cast(ctx, input0_expr, op_type, location);
if (!cast_expr_input0) {
LOG(ERROR) << "Failed to create cast expression for PIECE input0. key: "
<< op.key;
return {};
}
input0_expr = cast_expr_input0;
auto *cast_expr_input1 = make_cast(ctx, input1_expr, op_type, location);
if (!cast_expr_input1) {
LOG(ERROR) << "Failed to create cast expression for PIECE input1. key: "
<< op.key;
return {};
}
input1_expr = cast_expr_input1;
}
// When low_width is 0, the shift is a no-op — skip it to avoid "<< 0".
clang::Expr *high_expr = input0_expr;
if (low_width > 0) {
auto *shift_value = clang::IntegerLiteral::Create(
ctx, llvm::APInt(ctx.getIntWidth(ctx.IntTy), low_width), ctx.IntTy, location
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9a0859f to
644966d
Compare
decomp: update the ci pipeline to verify the function calls decomp: add unit tests for decompilation decomp: update option flags to disable structuring passes decomp: update switch handling and remove fallback code entirely and add new tests
3bff79c to
b8be098
Compare
kyle-elliott-tob
left a comment
There was a problem hiding this comment.
There's quite a tests without a FileCheck and some general cleanup on naming too, lots of mixed cases/naming conventions
| // References invalidated — use g.node() from here | ||
|
|
||
| // Redirect cond's edge from clause to copy | ||
| g.node(bl.id).succs[i] = copy_id; |
There was a problem hiding this comment.
As the comment above suggests, references are invalidated but we still reference bl.id in two places after, which could be a use-after-free. Would recommend storing ID in a local
| } | ||
|
|
||
| std::list<TraceDAG::BlockTrace *>::iterator | ||
| TraceDAG::openBranch(CGraph &g, BlockTrace *parent) { |
There was a problem hiding this comment.
pushBranches calls openBranch(g, bt) where bt = *current_activeriter. Inside openBranch, removeActive(parent) erases the list node that current_activeiter points to, causing the function to return an invalidated iterator which pushBranches then dereferences.
I think the fix is to grab the next iter from parent->activeiter before removeActive and return that, but this needs to be verified as I don't even know if I traced this logic correctly.
| } | ||
|
|
||
| while (activecount > 0) { | ||
| int missedcount = 0; |
There was a problem hiding this comment.
missedcount seems to always be zero, which makes this possibly infinitely loop. Guard condition for missedcount >= activecount will never be triggered.
| std::vector< size_t > exits; // blocks outside the loop that are successors of body blocks | ||
| NaturalLoop *parent = nullptr; // enclosing loop (nullptr if outermost) | ||
| std::vector< NaturalLoop * > children; // nested loops | ||
| }; |
There was a problem hiding this comment.
Child pointers to other NaturalLoops can be invalidated by calls in detectLoops leading to dangling pointer references
| return x == a; | ||
| } | ||
|
|
||
| std::vector< size_t > DomTree::DominanceFrontier(size_t block, |
There was a problem hiding this comment.
I could be totally wrong about this, my my understanding Is that Dominance Frontier analysis examines all edges. What's the use of the of the fg.blocks[b].succs.size() < 2 guard here?
There was a problem hiding this comment.
This is a guard to ignore any unconditional edges from the dominance frontier. Probably, we don't need this optimization here.
| p.erase(std::remove(p.begin(), p.end(), nid), p.end()); | ||
| } | ||
| } | ||
| for (size_t p : nodes[nid].preds) { |
There was a problem hiding this comment.
If two collapsed nodes share a predecessor, doesn't this get duped?
| } | ||
| } | ||
|
|
||
| // Fallback goto after the switch. |
There was a problem hiding this comment.
If a switch block has case gotos but no post-switch fallback, fallthrough_succ stays at 0 which might produce a bogus edge. Idk if there's a way to have a sentinel value or an optional maybe.
lib/patchestry/AST/CfgBuilder.cpp
Outdated
| // Add the sub-statement (what the label wraps) | ||
| clang::Stmt *sub = label_stmt->getSubStmt(); | ||
| if (sub && !llvm::isa< clang::NullStmt >(sub)) { | ||
| // If sub is another label, recurse; otherwise add it |
There was a problem hiding this comment.
Instead of handling only two levels, maybe we can use a while loop to peel nested LabelStmt nodes.
lib/patchestry/AST/LoopInfo.cpp
Outdated
| worklist.push(be.tail); | ||
| } | ||
|
|
||
| // Build predecessor map |
There was a problem hiding this comment.
Built ever iteration, can this be hoisted outside of the loop? Not sure as I can't trace all the logic here
| } | ||
| } | ||
|
|
||
| void LoopBody::labelContainments( |
There was a problem hiding this comment.
Comments should probably be situated here
c32af5c to
5d2900e
Compare
@kyle-elliott-tob, Thanks for the review. |
This PR introduces new AST infrastructure for CFG construction, dominance analysis, loop detection, and structured code emission. It adds components such as CfgBuilder, DomTree, LoopInfo, and a Ghidra-inspired CollapseStructure algorithm to transform unstructured CFGs into structured SNode trees. It also introduces ClangEmitter for AST emission and SNodeDebug for Graphviz visualization. Minor naming and style updates improve consistency across related headers.