Skip to content

[AArch64][GlobalISel] Overall GISel operation status #115133

@davemgreen

Description

@davemgreen

This is a copy of an internal page me and @chuongg3 had when going through each of the operations for AArch64 GISel, making sure they don't fall back. Not all of it is complete yet (and the internal version had a few more details), but it is better to have this upstream. Some of it might now be out of date.

A few high level comments

  • This does not include SVE, we should probably do the same elsewhere.
  • BF16 still needs to be added, but requires a new way to specify the types / operations (and patterns were disabled in [GISel] Explicitly disable BF16 tablegen patterns. #124113).
  • BigEndian isn't handled yet.
  • Currently some operations widen, some promote. We should stick to one (probably widen).
  • Blank spaces usually mean not checked / not supported. We will get to the point where random-testing will start to be more useful.

Edit: There is now https://davemgreen.github.io/gisel.html, which shows what works and whether it is smaller or bigger than SDAG. It is still WIP and doesn't show all operations yet.

Legend:

  • Scalar normal = i8/i16/i32/i64
  • Vector legal = v8i8/v4i16/v2i32 + v16i8/v8i16/v4i32/v2i64
  • Vector larger/smaller = i8/i16/i32/i64 types with non-legal sizes
  • i128 = scalar/vector
  • i1 = scalar/vector
  • Scalar ext = non-power2 sizes, including larger sizes
  • Vector odd widths = i8/i16/i32/i64 with non-power-2 widths.
  • Vector odd eltsize = non-power2 elt sizes (or i128, etc).
Operation Scalar normal Vector legal ptr i128/i1 Vector larger / smaller Scalar ext Vector odd widths Vector odd eltsizes Additional Notes
load y y y/y #116006
store y y y/y
bitcast? ptrtoint? inttoptr? y y
getelementptr y
phi y y y/y y
select
memcpy? memmove? memset? bzero?
Int Operation Scalar normal Vector normal i128 s/v i1 s/v Vector larger / smaller Scalar non-power-2 Vector odd widths Vector odd eltsizes Additional Notes
add y y y/y y y x x https://godbolt.org/z/6c1rfWTK8
sub y y y/y y y x x
mul y y y/y inefficient y Scalar i128 #115512. https://godbolt.org/z/8Wd8zhezc
sdiv, udiv y y y/y y Scalar i1 could be simpler. https://godbolt.org/z/45qMq6cvh.
srem, urem y y y/y y Scalar i1:
zext, sext, anyext y y ZEXT: Global ISel could be improved to match SDAG by using BIC
trunc y y y x Non-pow2 larger than 8
and y y y/y y https://godbolt.org/z/6Y98TnYv8
or y y y/y y
xor y y y/y y
- not y y y y https://godbolt.org/z/rh4ob1be7
shl y y y y (v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
ashr y y y y(v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
lshr y y y y(v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
icmp y y y (i128 could be better) x y(v2i8) i128 could do a lot better.
select y y y y (v2i8) Scalarl: Unnecessary AND to clear upper lanes of the condition register
abs y y y x y https://godbolt.org/z/Tobs7YeoT
smin/smax/umin/umax y y y y x > i128 i1/i128 could do better. https://godbolt.org/z/j7nx789oz.
uaddsat/usubsat/saddsat/ssubsat y y y/y y/x y i128 could do better. i1 vectors fall back. https://godbolt.org/z/4MT14bfsv
bitreverse y y y y https://godbolt.org/z/3sd988Mhd
bswap y x y x y
ctlz y y y y x > i128 #131514
cttz y y y x x > i128 #131513
ctpop y y y x x #131513
fshr/fshl y y y x x NonPow2 > 128 Scalar Normal:
- rotr/rotl? y y y y y
uaddo, usubo, uadde, usube?
umulo, smulo?
umulh, smulh
ushlsat, sshlsat
smulfix, umulfix
smulfixsat, umulfixsat
sdivfix, udivfix
sdivfixsat, udivfixsat
FP Operation Scalar normal Vector legal f128 s/v Vector smaller / larger bf16 s/v Vector widths Additional Notes
fadd y y y/y y https://godbolt.org/z/bYWfo9v16
fsub y y y/y y
fmul y y y/y y
fma y y y/y y https://godbolt.org/z/1osE3Whaq
fmuladd y y y/y y
fdiv y y y/y y
frem y y y/y y
fneg y y y/y y https://godbolt.org/z/rz96eh3PW
fpext y y y/y y https://godbolt.org/z/358EG4j7r
fptrunc y y y/y y https://godbolt.org/z/7a7hq6j68
fptosi, fptoui y y y/y y
fptosisat, fptouisat
sitofp, uitofp y y y/y y https://godbolt.org/z/j7Prz7qj6
fabs y y y/y y https://godbolt.org/z/o95h4a9es
fsqrt y y y/y y
ceil, floor, trunc, rint, nearbyint y y y/y y https://godbolt.org/z/zjMqq5oeo
lrint, llrint, lround, llround
fminnum, fmaxnum y y y/y y
fminimum, fmaximum y y y
fminimumnum, fmaximumnum
fcopysign y y y/y y https://godbolt.org/z/aq5bbc4jG
fpow y y y/y y https://godbolt.org/z/WEeWYj1e4
fpowi y y y/y y
sin, cos, etc y y y/y y
fexp, fexp2, flog, flog2, flog10 y y y/y y
fldexp, frexmp
fcanonicalize
is_fpclass
Vector Operation Scalar normal Vector legal Vector smaller / larger ptr Scalar ext Vector odd widths Vector odd eltsizes Additional Notes
insert - - y y -
extract - - y y -
shuffle* - - -
- dup - - y -
- ext - - y y -
- zip1/zip2/uzp2/uzp2/trn1/trn2 - - y -
- tbl - - y y - Could do with tbl2/tbl4 combines
- reverse - - y y - Needs full reverses from #119083
- perfect shuffles - - #106446 -
reduce.add - - y - -
reduce.mul - - -
reduce.smin/smax/umin/umax - - -
reduce.and/or/xor - - -
reduce.fadd - - y y - -
reduce.fadd strict - - y y - - These just scalarize which isn't always the most efficient.
reduce.fmul - - y y - -
reduce.fmul strict - - y y - -
reduce.fmin/fmax/fminimum/fmaxmum - - y - x  

What follows is a (very!) incomplete list of random optimizations and other missing features we will need for global isel. If anyone is interested in any then there is lots to do. Jump in and get involved.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions