Skip to content

Commit 701af54

Browse files
authored
Cranelift: egraphs: fix a few sources of exponential rewrite blowup. (#10579)
This arrived as a fuzzbug [1] with some very interesting optimization behavior. The test case has sequences of `(select _ x x)` operators -- that is, conditional selects with both inputs the same -- that are chained together sequentially. A few aspects of the egraph framework and our optimization rules conspired to create exponential blowup: - We have a rewrite rule for `(select _ x x) -> x`, but we do not subsume; this means that we create an eclass for both. This in itself is not a problem; however... - We have some *other* rules that look through the inputs to the select to detect other cases (e.g.: select between constants 1 and 0, or 0 and 0, or ...), so we traverse both inputs; - And we also do nested rewrites, so when the rewrite rule for `(select _ x x) -> x` fires, and the `x` is itself another select in a long chain of selects, we traverse all possible paths (through first or second args) to the roots. In effect we get an eclass that has the ultimate root and then 2^n combinations of `select` nodes on top of that. This got worse with the recent change to canonicalize less (for simpler/cheaper compilation), hence the fuzzbug timeouts. This PR includes a few fixes, all complementary to each other: - The `(select _ x x) -> x` rule now subsumes; this is another case where we have a strictly better rewrite and so we should short-circuit the eclass blowup. - The rewrite runner sorts and dedups returned value numbers; in debugging the above I noticed we were getting two rules producing the same rewritten value and we were adding the same value twice with two union nodes. - The rewriter keeps a total eclass size per root and limits the total eclass size to a fixed limit (currently 5). We thus now have limits in three different axes: depth of eager rewrites (5); number of returned matches (also 5); and total size of eclass (5). The first two don't necessarily imply the third because we otherwise can keep unioning on top of an eclass and (as seen above) see exponential blowup. [1]: https://oss-fuzz.com/testcase-detail/4806924172591104
1 parent 5279f5c commit 701af54

File tree

13 files changed

+671
-61
lines changed

13 files changed

+671
-61
lines changed

cranelift/codegen/src/egraph.rs

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,12 @@ pub struct EgraphPass<'a> {
7272
pub(crate) stats: Stats,
7373
}
7474

75-
// The maximum number of rewrites we will take from a single call into ISLE.
75+
/// The maximum number of rewrites we will take from a single call into ISLE.
7676
const MATCHES_LIMIT: usize = 5;
7777

78+
/// The maximum number of enodes in any given eclass.
79+
const ECLASS_ENODE_LIMIT: usize = 5;
80+
7881
/// Context passed through node insertion and optimization.
7982
pub(crate) struct OptimizeCtx<'opt, 'analysis>
8083
where
@@ -84,6 +87,7 @@ where
8487
pub(crate) func: &'opt mut Function,
8588
pub(crate) value_to_opt_value: &'opt mut SecondaryMap<Value, Value>,
8689
available_block: &'opt mut SecondaryMap<Value, Block>,
90+
eclass_size: &'opt mut SecondaryMap<Value, u8>,
8791
pub(crate) gvn_map: &'opt mut ScopedHashMap<(Type, InstructionData), Option<Value>>,
8892
pub(crate) gvn_map_blocks: &'opt Vec<Block>,
8993
pub(crate) remat_values: &'opt mut FxHashSet<Value>,
@@ -344,6 +348,11 @@ where
344348
optimized_values.truncate(MATCHES_LIMIT);
345349
}
346350

351+
// Sort and deduplicate optimized values, in case multiple
352+
// rules produced the same simplification.
353+
optimized_values.sort_unstable();
354+
optimized_values.dedup();
355+
347356
trace!(" -> returned from ISLE: {orig_value} -> {optimized_values:?}");
348357

349358
// Construct a union-node tree representing the new eclass
@@ -360,6 +369,7 @@ where
360369
subsuming_value
361370
} else {
362371
let mut union_value = orig_value;
372+
let mut eclass_size = ctx.eclass_size[orig_value] + 1;
363373
for optimized_value in optimized_values.drain(..) {
364374
trace!(
365375
"Returned from ISLE for {}, got {:?}",
@@ -371,8 +381,16 @@ where
371381
ctx.stats.pure_inst_rewrite_to_self += 1;
372382
continue;
373383
}
384+
let rhs_eclass_size = ctx.eclass_size[optimized_value] + 1;
385+
if usize::from(eclass_size) + usize::from(rhs_eclass_size) > ECLASS_ENODE_LIMIT {
386+
trace!(" -> reached eclass size limit");
387+
ctx.stats.eclass_size_limit += 1;
388+
break;
389+
}
374390
let old_union_value = union_value;
375391
union_value = ctx.func.dfg.union(old_union_value, optimized_value);
392+
eclass_size += rhs_eclass_size;
393+
ctx.eclass_size[union_value] = eclass_size - 1;
376394
ctx.stats.union += 1;
377395
trace!(" -> union: now {}", union_value);
378396
ctx.func.dfg.merge_facts(old_union_value, optimized_value);
@@ -807,6 +825,17 @@ impl<'a> EgraphPass<'a> {
807825
let mut available_block: SecondaryMap<Value, Block> =
808826
SecondaryMap::with_default(Block::reserved_value());
809827

828+
// To avoid blowing up eclasses too much, we track the size of
829+
// each eclass reachable by a tree of union nodes from a given
830+
// value ID, and we avoid union'ing additional values into an
831+
// eclass when it reaches `ECLASS_ENODE_LIMIT`.
832+
//
833+
// For efficiency, this encodes size minus one: so a value of
834+
// zero (which is cheap to bulk-initialize) means a singleton
835+
// eclass of size one. This also allows us to avoid explicitly
836+
// writing the size for any values that are not union nodes.
837+
let mut eclass_size: SecondaryMap<Value, u8> = SecondaryMap::with_default(0);
838+
810839
// This is an initial guess at the size we'll need, but we add
811840
// more values as we build simplified alternative expressions so
812841
// this is likely to realloc again later.
@@ -870,6 +899,7 @@ impl<'a> EgraphPass<'a> {
870899
gvn_map: &mut gvn_map,
871900
gvn_map_blocks: &mut gvn_map_blocks,
872901
available_block: &mut available_block,
902+
eclass_size: &mut eclass_size,
873903
rewrite_depth: 0,
874904
subsume_values: FxHashSet::default(),
875905
remat_values: &mut self.remat_values,
@@ -1089,4 +1119,5 @@ pub(crate) struct Stats {
10891119
pub(crate) elaborate_func_pre_insts: u64,
10901120
pub(crate) elaborate_func_post_insts: u64,
10911121
pub(crate) elaborate_best_cost_fixpoint_iters: u64,
1122+
pub(crate) eclass_size_limit: u64,
10921123
}

cranelift/codegen/src/opts/selects.isle

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
;; `select`/`bitselect`-related rewrites
22

33
;; remove select when both choices are the same
4-
(rule (simplify (select ty _ x x)) x)
5-
(rule (simplify (bitselect ty _ x x)) x)
4+
(rule (simplify (select ty _ x x)) (subsume x))
5+
(rule (simplify (bitselect ty _ x x)) (subsume x))
66

77
;; Push zeroes to the right -- this makes the select `truthy`, as used elsewhere
88
;; if icmp { 0 } else { nonzero } => if !icmp { nonzero } else { 0 }

cranelift/filetests/filetests/egraph/cprop.clif

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -278,10 +278,10 @@ block0(v0: i32, v1: i32):
278278
return v7
279279
}
280280

281-
; check: v42 = iconst.i32 333
282-
; check: v51 = iadd v0, v42
283-
; check: v52 = icmp eq v1, v51
284-
; nextln: return v52
281+
; check: v40 = iconst.i32 333
282+
; check: v49 = iadd v0, v40
283+
; check: v50 = icmp eq v1, v49
284+
; nextln: return v50
285285

286286
function %ireduce_iconst() -> i8 {
287287
block0:

0 commit comments

Comments
 (0)