Skip to content

Commit de0e0be

Browse files
Legalize b{and,or,xor}_not into component instructions (#5709)
* Remove trailing whitespace in `lower.isle` files * Legalize the `band_not` instruction into simpler form This commit legalizes the `band_not` instruction into `band`-of-`bnot`, or two instructions. This is intended to assist with egraph-based optimizations where the `band_not` instruction doesn't have to be specifically included in other bit-operation-patterns. Lowerings of the `band_not` instruction have been moved to a specialization of the `band` instruction. * Legalize `bor_not` into components Same as prior commit, but for the `bor_not` instruction. * Legalize bxor_not into bxor-of-bnot Same as prior commits. I think this also ended up fixing a bug in the s390x backend where `bxor_not x y` was actually translated as `bnot (bxor x y)` by accident given the test update changes. * Simplify not-fused operands for riscv64 Looks like some delegated-to rules have special-cases for "if this feature is enabled use the fused instruction" so move the clause for testing the feature up to the lowering phase to help trigger other rules if the feature isn't enabled. This should make the riscv64 backend more consistent with how other backends are implemented. * Remove B{and,or,xor}Not from cost of egraph metrics These shouldn't ever reach egraphs now that they're legalized away. * Add an egraph optimization for `x^-1 => ~x` This adds a simplification node to translate xor-against-minus-1 to a `bnot` instruction. This helps trigger various other optimizations in the egraph implementation and also various backend lowering rules for instructions. This is chiefly useful as wasm doesn't have a `bnot` equivalent, so it's encoded as `x^-1`. * Add a wasm test for end-to-end bitwise lowerings Test that end-to-end various optimizations are being applied for input wasm modules. * Specifically don't self-update rustup on CI I forget why this was here originally, but this is failing on Windows CI. In general there's no need to update rustup, so leave it as-is. * Cleanup some aarch64 lowering rules Previously a 32/64 split was necessary due to the `ALUOp` being different but that's been refactored away no so there's no longer any need for duplicate rules. * Narrow a x64 lowering rule This previously made more sense when it was `band_not` and rarely used, but be more specific in the type-filter on this rule that it's only applicable to SIMD types with lanes. * Simplify xor-against-minus-1 rule No need to have the commutative version since constants are already shuffled right for egraphs * Optimize band-of-bnot when bnot is on the left Use some more rules in the egraph algebraic optimizations to canonicalize band/bor/bxor with a `bnot` operand to put the operand on the right. That way the lowerings in the backends only have to list the rule once, with the operand on the right, to optimize both styles of input. * Add commutative lowering rules * Update cranelift/codegen/src/isa/x64/lower.isle Co-authored-by: Jamey Sharp <[email protected]> --------- Co-authored-by: Jamey Sharp <[email protected]>
1 parent 99c3936 commit de0e0be

File tree

17 files changed

+506
-277
lines changed

17 files changed

+506
-277
lines changed

.github/actions/install-rust/action.yml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,6 @@ runs:
1818
- name: Install Rust
1919
shell: bash
2020
run: |
21-
22-
if [[ "${{ runner.os }}" = "Windows" ]]; then
23-
rustup self update
24-
fi
25-
2621
rustup set profile minimal
2722
rustup update "${{ inputs.toolchain }}" --no-self-update
2823
rustup default "${{ inputs.toolchain }}"

cranelift/codegen/src/egraph/cost.rs

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,8 @@ pub(crate) fn pure_op_cost(op: Opcode) -> Cost {
8585
Opcode::Iadd
8686
| Opcode::Isub
8787
| Opcode::Band
88-
| Opcode::BandNot
8988
| Opcode::Bor
90-
| Opcode::BorNot
9189
| Opcode::Bxor
92-
| Opcode::BxorNot
9390
| Opcode::Bnot
9491
| Opcode::Ishl
9592
| Opcode::Ushr

cranelift/codegen/src/isa/aarch64/lower.isle

Lines changed: 44 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -580,7 +580,7 @@
580580
(sub ty (zero_reg) x))
581581

582582
;; `i128`
583-
(rule 2 (lower (has_type $I128 (ineg x)))
583+
(rule 2 (lower (has_type $I128 (ineg x)))
584584
(sub_i128 (value_regs_zero) x))
585585

586586
;; vectors.
@@ -1054,75 +1054,74 @@
10541054

10551055
;;;; Rules for `band` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
10561056

1057-
(rule -1 (lower (has_type (fits_in_32 ty) (band x y)))
1057+
(rule -1 (lower (has_type (fits_in_64 ty) (band x y)))
10581058
(alu_rs_imm_logic_commutative (ALUOp.And) ty x y))
10591059

1060-
(rule (lower (has_type $I64 (band x y)))
1061-
(alu_rs_imm_logic_commutative (ALUOp.And) $I64 x y))
1062-
10631060
(rule (lower (has_type $I128 (band x y))) (i128_alu_bitop (ALUOp.And) $I64 x y))
10641061

10651062
(rule -2 (lower (has_type (ty_vec128 ty) (band x y)))
10661063
(and_vec x y (vector_size ty)))
10671064

1065+
;; Specialized lowerings for `(band x (bnot y))` which is additionally produced
1066+
;; by Cranelift's `band_not` instruction that is legalized into the simpler
1067+
;; forms early on.
1068+
1069+
(rule 1 (lower (has_type (fits_in_64 ty) (band x (bnot y))))
1070+
(alu_rs_imm_logic (ALUOp.AndNot) ty x y))
1071+
(rule 2 (lower (has_type (fits_in_64 ty) (band (bnot y) x)))
1072+
(alu_rs_imm_logic (ALUOp.AndNot) ty x y))
1073+
1074+
(rule 3 (lower (has_type $I128 (band x (bnot y)))) (i128_alu_bitop (ALUOp.AndNot) $I64 x y))
1075+
(rule 4 (lower (has_type $I128 (band (bnot y) x))) (i128_alu_bitop (ALUOp.AndNot) $I64 x y))
1076+
1077+
(rule 5 (lower (has_type (ty_vec128 ty) (band x (bnot y))))
1078+
(bic_vec x y (vector_size ty)))
1079+
(rule 6 (lower (has_type (ty_vec128 ty) (band (bnot y) x)))
1080+
(bic_vec x y (vector_size ty)))
1081+
10681082
;;;; Rules for `bor` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
10691083

1070-
(rule -1 (lower (has_type (fits_in_32 ty) (bor x y)))
1084+
(rule -1 (lower (has_type (fits_in_64 ty) (bor x y)))
10711085
(alu_rs_imm_logic_commutative (ALUOp.Orr) ty x y))
10721086

1073-
(rule (lower (has_type $I64 (bor x y)))
1074-
(alu_rs_imm_logic_commutative (ALUOp.Orr) $I64 x y))
1075-
10761087
(rule (lower (has_type $I128 (bor x y))) (i128_alu_bitop (ALUOp.Orr) $I64 x y))
10771088

10781089
(rule -2 (lower (has_type (ty_vec128 ty) (bor x y)))
10791090
(orr_vec x y (vector_size ty)))
10801091

1092+
;; Specialized lowerings for `(bor x (bnot y))` which is additionally produced
1093+
;; by Cranelift's `bor_not` instruction that is legalized into the simpler
1094+
;; forms early on.
1095+
1096+
(rule 1 (lower (has_type (fits_in_64 ty) (bor x (bnot y))))
1097+
(alu_rs_imm_logic (ALUOp.OrrNot) ty x y))
1098+
(rule 2 (lower (has_type (fits_in_64 ty) (bor (bnot y) x)))
1099+
(alu_rs_imm_logic (ALUOp.OrrNot) ty x y))
1100+
1101+
(rule 3 (lower (has_type $I128 (bor x (bnot y)))) (i128_alu_bitop (ALUOp.OrrNot) $I64 x y))
1102+
(rule 4 (lower (has_type $I128 (bor (bnot y) x))) (i128_alu_bitop (ALUOp.OrrNot) $I64 x y))
1103+
10811104
;;;; Rules for `bxor` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
10821105

1083-
(rule -1 (lower (has_type (fits_in_32 ty) (bxor x y)))
1106+
(rule -1 (lower (has_type (fits_in_64 ty) (bxor x y)))
10841107
(alu_rs_imm_logic_commutative (ALUOp.Eor) ty x y))
10851108

1086-
(rule (lower (has_type $I64 (bxor x y)))
1087-
(alu_rs_imm_logic_commutative (ALUOp.Eor) $I64 x y))
1088-
10891109
(rule (lower (has_type $I128 (bxor x y))) (i128_alu_bitop (ALUOp.Eor) $I64 x y))
10901110

10911111
(rule -2 (lower (has_type (ty_vec128 ty) (bxor x y)))
10921112
(eor_vec x y (vector_size ty)))
10931113

1094-
;;;; Rules for `band_not` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
1095-
1096-
(rule -1 (lower (has_type (fits_in_32 ty) (band_not x y)))
1097-
(alu_rs_imm_logic (ALUOp.AndNot) ty x y))
1098-
1099-
(rule (lower (has_type $I64 (band_not x y)))
1100-
(alu_rs_imm_logic (ALUOp.AndNot) $I64 x y))
1114+
;; Specialized lowerings for `(bxor x (bnot y))` which is additionally produced
1115+
;; by Cranelift's `bxor_not` instruction that is legalized into the simpler
1116+
;; forms early on.
11011117

1102-
(rule (lower (has_type $I128 (band_not x y))) (i128_alu_bitop (ALUOp.AndNot) $I64 x y))
1118+
(rule 1 (lower (has_type (fits_in_64 ty) (bxor x (bnot y))))
1119+
(alu_rs_imm_logic (ALUOp.EorNot) ty x y))
1120+
(rule 2 (lower (has_type (fits_in_64 ty) (bxor (bnot y) x)))
1121+
(alu_rs_imm_logic (ALUOp.EorNot) ty x y))
11031122

1104-
(rule -2 (lower (has_type (ty_vec128 ty) (band_not x y)))
1105-
(bic_vec x y (vector_size ty)))
1106-
1107-
;;;; Rules for `bor_not` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
1108-
1109-
(rule -1 (lower (has_type (fits_in_32 ty) (bor_not x y)))
1110-
(alu_rs_imm_logic (ALUOp.OrrNot) ty x y))
1111-
1112-
(rule (lower (has_type $I64 (bor_not x y)))
1113-
(alu_rs_imm_logic (ALUOp.OrrNot) $I64 x y))
1114-
1115-
(rule (lower (has_type $I128 (bor_not x y))) (i128_alu_bitop (ALUOp.OrrNot) $I64 x y))
1116-
1117-
;;;; Rules for `bxor_not` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
1118-
1119-
(rule -1 (lower (has_type (fits_in_32 ty) (bxor_not x y)))
1120-
(alu_rs_imm_logic (ALUOp.EorNot) $I32 x y))
1121-
1122-
(rule (lower (has_type $I64 (bxor_not x y)))
1123-
(alu_rs_imm_logic (ALUOp.EorNot) $I64 x y))
1124-
1125-
(rule (lower (has_type $I128 (bxor_not x y))) (i128_alu_bitop (ALUOp.EorNot) $I64 x y))
1123+
(rule 3 (lower (has_type $I128 (bxor x (bnot y)))) (i128_alu_bitop (ALUOp.EorNot) $I64 x y))
1124+
(rule 4 (lower (has_type $I128 (bxor (bnot y) x))) (i128_alu_bitop (ALUOp.EorNot) $I64 x y))
11261125

11271126
;;;; Rules for `ishl` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
11281127

@@ -2407,7 +2406,7 @@
24072406
;; sign extended. We then check if the output sign bit has flipped.
24082407
(rule 0 (lower (has_type (fits_in_16 ty) (iadd_cout a b)))
24092408
(let ((extend ExtendOp (lower_extend_op ty $true))
2410-
2409+
24112410
;; Instead of emitting two `sxt{b,h}` we do one as an instruction and
24122411
;; the other as an extend operation in the `add` instruction.
24132412
;;
@@ -2417,7 +2416,7 @@
24172416
;; cset out_carry, ne
24182417
(a_sext Reg (put_in_reg_sext32 a))
24192418
(out Reg (add_extend_op ty a_sext b extend))
2420-
(out_carry Reg (with_flags_reg
2419+
(out_carry Reg (with_flags_reg
24212420
(cmp_extend (OperandSize.Size32) out out extend)
24222421
(cset (Cond.Ne)))))
24232422
(output_pair

cranelift/codegen/src/isa/riscv64/inst.isle

Lines changed: 2 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1950,32 +1950,14 @@
19501950

19511951
;;;
19521952
(decl gen_andn (Reg Reg) Reg)
1953-
(rule 1
1954-
(gen_andn rs1 rs2)
1955-
(if-let $true (has_b))
1953+
(rule 1 (gen_andn rs1 rs2)
19561954
(alu_rrr (AluOPRRR.Andn) rs1 rs2))
19571955

1958-
(rule
1959-
(gen_andn rs1 rs2)
1960-
(if-let $false (has_b))
1961-
(let
1962-
((tmp Reg (gen_bit_not rs2)))
1963-
(alu_and rs1 tmp)))
1964-
19651956
;;;
19661957
(decl gen_orn (Reg Reg) Reg)
1967-
(rule 1
1968-
(gen_orn rs1 rs2 )
1969-
(if-let $true (has_b))
1958+
(rule 1 (gen_orn rs1 rs2)
19701959
(alu_rrr (AluOPRRR.Orn) rs1 rs2))
19711960

1972-
(rule
1973-
(gen_orn rs1 rs2)
1974-
(if-let $false (has_b))
1975-
(let
1976-
((tmp Reg (gen_bit_not rs2)))
1977-
(alu_rrr (AluOPRRR.Or) rs1 tmp)))
1978-
19791961
(decl gen_rev8 (Reg) Reg)
19801962
(rule 1
19811963
(gen_rev8 rs)
@@ -2014,14 +1996,6 @@
20141996
(_ Unit (emit (MInst.Brev8 rs ty step tmp tmp2 rd))))
20151997
(writable_reg_to_reg rd)))
20161998

2017-
;;; x ^ ~y
2018-
(decl gen_xor_not (Reg Reg) Reg)
2019-
(rule
2020-
(gen_xor_not x y)
2021-
(let
2022-
((tmp Reg (gen_bit_not y)))
2023-
(alu_rrr (AluOPRRR.Xor) x tmp)))
2024-
20251999
;; Negates x
20262000
;; Equivalent to 0 - x
20272001
(decl neg (Type ValueRegs) ValueRegs)

0 commit comments

Comments
 (0)