diff --git a/c2rust-transpile/src/cfg/loops.rs b/c2rust-transpile/src/cfg/loops.rs index 05fb151f48..462c873a59 100644 --- a/c2rust-transpile/src/cfg/loops.rs +++ b/c2rust-transpile/src/cfg/loops.rs @@ -170,28 +170,30 @@ impl LoopInfo { self.loops.extend(other.loops); } - /// Find the smallest possible loop that contains all of the items + /// Finds the smallest possible loop that contains all of the entries. pub fn tightest_common_loop>(&self, mut entries: E) -> Option { + // Start with the loop containing the first entry. let first = entries.next()?; - let mut loop_id = *self.node_loops.get(&first)?; + // Widen the loop until it contains the all of our entries, or it can no longer + // be widened. for entry in entries { - // Widen the loop until it contains the `entry`, or it can no longer be widened. loop { - match self.loops.get(&loop_id) { - Some((ref in_loop, parent_id)) => { - if in_loop.contains(&entry) { - break; - } - loop_id = if let Some(i) = parent_id { - *i - } else { - return None; - }; - } - - None => return None, + // NOTE: We currently bail out in the case where there is no loop data + // corresponding to the current loop ID, but it's unclear if that's the right + // thing to do. In theory we should always have loop data corresponding to a + // loop ID, but the `nested_goto.c` test definitely ends up with a loop ID that + // doesn't have corresponding info. We should investigate why that happens and + // determine if it's valid or not. + let (in_loop, parent_id) = self.loops.get(&loop_id)?; + + // If our current loop contains the entry, move on to the next entry. Otherwise + // move on to the next wider loop if there is one. + if in_loop.contains(&entry) { + break; + } else { + loop_id = parent_id.clone()?; } } } diff --git a/c2rust-transpile/src/cfg/mod.rs b/c2rust-transpile/src/cfg/mod.rs index 270c640267..6b1146cfa7 100644 --- a/c2rust-transpile/src/cfg/mod.rs +++ b/c2rust-transpile/src/cfg/mod.rs @@ -14,6 +14,7 @@ //! - simplify that sequence of `Structure`s into another such sequence //! - convert the `Vec>` back into a `Vec` //! +//! See the [`relooper`] module for more details about the Relooper algorithm. use crate::c_ast::iterators::{DFExpr, SomeId}; use crate::c_ast::CLabelId; diff --git a/c2rust-transpile/src/cfg/relooper.rs b/c2rust-transpile/src/cfg/relooper.rs index fcbe00a23e..bb6dc3bddd 100644 --- a/c2rust-transpile/src/cfg/relooper.rs +++ b/c2rust-transpile/src/cfg/relooper.rs @@ -1,5 +1,104 @@ -//! This modules handles converting a a control-flow graph `Cfg` into `Vec`, optionally -//! simplifying the latter. +//! This module contains the relooper algorithm for creating structured control +//! flow from a CFG. +//! +//! Relooper is an algorithm for converting an arbitrary, unstructured +//! control-flow graph (CFG) into the structured control-flow constructs +//! available in Rust. The original relooper algorithm was described in the +//! [Emscripten paper][emscripten], which describes converting a CFG into +//! JavaScript's structured control-flow constructs. The implementation of +//! relooper used here is based on the original algorithm, with the addition of +//! some heuristics and Rust-specific control-flow constructs. +//! +//! [emscripten]: https://dl.acm.org/doi/10.1145/2048147.2048224 +//! +//! # Terminology +//! +//! The terms "label", "block", and "node" are sometimes used interchangeably in +//! this file. A label is the unique identifier for a block, and a block is a +//! node in the control-flow graph. In some cases we're working directly with +//! the basic blocks, but in many places when working with the CFG we're dealing +//! only with labels, and the blocks themselves are secondary. +//! +//! # Relooper Algorithm +//! +//! The relooper algorithm works by recursively breaking down sections of the +//! CFG into structured control-flow constructs, partitioning blocks into either +//! `Simple` structures, `Loop` structures, or `Multiple` structures. At each +//! step, we start with a set of basic blocks and information about which of +//! those blocks act as entry points to this portion of the CFG. We then look +//! for ways to break down the CFG into smaller sections that can be represented +//! using structured control-flow constructs. +//! +//! If we have a single entry point, and there are no back edges to that entry, +//! then we can generate a simple structure. `Simple` structures contain a +//! single basic block, and represent a straight-line sequence of instructions. +//! All remaining blocks are then recursively relooped, with the immediate +//! successors of the entry becoming the new entries for the rest of that +//! portion of the CFG. +//! +//! If we have entries with back edges to them, then we can generate a `Loop` +//! structure. Any nodes that can reach the entry become part of the loop body, +//! with any remaining nodes becoming the follow blocks for the loop. The loop's +//! contents are then relooped into the loops's body, and the follow blocks get +//! relooped to be the logic that follows the loop. +//! +//! If we have more than one entry, then we can generate a `Multiple` structure. +//! These are effectively `match` statements, with each entry becoming an arm of +//! the `match`. Blocks that are only reachable by one of the entries (including +//! the entry itself) become the body blocks for that arm of the multiple, and +//! any blocks reachable from more than one entry become the follow blocks. We +//! then recursively reloop each of the branches of the multiple, and then +//! reloop the follow blocks. +//! +//! Note that there are a lot of subtleties to how we choose to partition blocks +//! into these structures. The logic in the relooper implementation contains +//! thorough comments describing what we're doing at each step and why we are +//! making the choices that we do. This module documentation covers some of +//! them, but you'll need to read through the full algorithm to get all of the +//! nuances. +//! +//! # Heuristics +//! +//! When reconstructing structured control flow from a CFG, there are often +//! multiple valid ways to structure the graph. In order to produce Rust code +//! that is as similar to the original C as possible, we have a couple of +//! heuristics that use information from the original C code to guide the +//! restructuring process. +//! +//! Before creating a loop, we first try to match a `Multiple` from the original +//! C. During transpilation we preserve information about where there are +//! branches in the C code along with which CFG nodes are part of those +//! branches, which we can then look up based on the current set of entries. If +//! we find that there is a `Multiple` in the original C that matches our +//! current entries, and the structure of the CFG allows it, we can reproduce +//! the control-flow from the original C. Doing this before creating a loop +//! helps in cases where we have branches with multiple disjoint loops, since +//! the loop analysis does not recognize disjoint loops and will always produce +//! a single loop with a `Multiple` inside of it handling the bodies of what +//! should be separate loops. +//! +//! When creating loops, we also make use of a similar heuristic that tries to +//! recreate the loops that we see in the original C. When partitioning blocks +//! into the loop's body, we first attempt to match an existing loop from the +//! original C. Failing that, we fall back on a heuristic that tries to keep as +//! many blocks as possible in the loop's body, even if they don't strictly +//! belong there according to the original C structure. +//! +//! # Simplification +//! +//! After the relooper algorithm runs, we have an optional simplification pass +//! that attempts to reduce the complexity of the generated control flow +//! structures. This pass can help to eliminate unnecessary nesting and make the +//! final output more readable. It applies two main simplifications: +//! +//! - Merge cases in [`Switch`] terminators if they target the same label. That +//! means instead of having `1 => goto A, 2 => goto A, 3 => goto B`, we +//! instead get `1 | 2 => goto A, 3 => goto B`. +//! - Inline `Multiple` structures into preceding `Simple` structures. When a +//! `Simple` structure with a `Switch` terminator is immediately followed by a +//! `Multiple`, the branches from the `Multiple` are inlined directly into the +//! `Switch` cases as `Nested` structures, eliminating the intermediate +//! `Multiple` and reducing nesting depth. use super::*; @@ -126,19 +225,22 @@ impl RelooperState { } } +/// A set of basic blocks, keyed by their label. +type StructuredBlocks = IndexMap, StmtOrDecl>>; + impl RelooperState { /// Recursive helper for `reloop`. /// /// TODO: perhaps manually perform TCO? fn relooper( &mut self, - entries: IndexSet