Skip to content

Commit fcc036f

Browse files
mundaymgopherbot
authored andcommitted
cmd/compile: optimise float <-> int register moves on riscv64
Use the FMV* instructions to move values between the floating point and integer register files. Note: I'm unsure why there is a slowdown in the Float32bits benchmark, I've checked and an FMVXS instruction is being used as expected. There are multiple loads and other instructions in the main loop. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ fmv-before.txt │ fmv-after.txt │ │ sec/op │ sec/op vs base │ Acos 122.7n ± 0% 122.7n ± 0% ~ (p=1.000 n=10) Acosh 197.2n ± 0% 191.5n ± 0% -2.89% (p=0.000 n=10) Asin 122.7n ± 0% 122.7n ± 0% ~ (p=0.474 n=10) Asinh 231.0n ± 0% 224.1n ± 0% -2.99% (p=0.000 n=10) Atan 91.39n ± 0% 91.41n ± 0% ~ (p=0.465 n=10) Atanh 210.3n ± 0% 203.4n ± 0% -3.26% (p=0.000 n=10) Atan2 149.6n ± 0% 149.6n ± 0% ~ (p=0.721 n=10) Cbrt 176.5n ± 0% 165.9n ± 0% -6.01% (p=0.000 n=10) Ceil 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Copysign 3.756n ± 0% 3.756n ± 0% ~ (p=0.149 n=10) Cos 95.15n ± 0% 95.15n ± 0% ~ (p=0.374 n=10) Cosh 228.6n ± 0% 224.7n ± 0% -1.71% (p=0.000 n=10) Erf 115.2n ± 0% 115.2n ± 0% ~ (p=0.474 n=10) Erfc 116.4n ± 0% 116.4n ± 0% ~ (p=0.628 n=10) Erfinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10) Erfcinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10) Exp 194.1n ± 0% 190.3n ± 0% -1.93% (p=0.000 n=10) ExpGo 204.7n ± 0% 200.3n ± 0% -2.15% (p=0.000 n=10) Expm1 137.7n ± 0% 135.2n ± 0% -1.82% (p=0.000 n=10) Exp2 173.4n ± 0% 169.0n ± 0% -2.54% (p=0.000 n=10) Exp2Go 182.8n ± 0% 178.4n ± 0% -2.41% (p=0.000 n=10) Abs 3.756n ± 0% 3.756n ± 0% ~ (p=0.157 n=10) Dim 12.52n ± 0% 12.52n ± 0% ~ (p=0.737 n=10) Floor 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Max 21.29n ± 0% 20.03n ± 0% -5.92% (p=0.000 n=10) Min 21.28n ± 0% 20.04n ± 0% -5.85% (p=0.000 n=10) Mod 344.9n ± 0% 319.2n ± 0% -7.45% (p=0.000 n=10) Frexp 55.71n ± 0% 48.85n ± 0% -12.30% (p=0.000 n=10) Gamma 165.9n ± 0% 167.8n ± 0% +1.15% (p=0.000 n=10) Hypot 73.24n ± 0% 70.74n ± 0% -3.41% (p=0.000 n=10) HypotGo 84.50n ± 0% 82.63n ± 0% -2.21% (p=0.000 n=10) Ilogb 49.45n ± 0% 45.70n ± 0% -7.59% (p=0.000 n=10) J0 556.5n ± 0% 544.0n ± 0% -2.25% (p=0.000 n=10) J1 555.3n ± 0% 542.8n ± 0% -2.24% (p=0.000 n=10) Jn 1.181µ ± 0% 1.156µ ± 0% -2.12% (p=0.000 n=10) Ldexp 59.47n ± 0% 53.84n ± 0% -9.47% (p=0.000 n=10) Lgamma 167.2n ± 0% 154.6n ± 0% -7.51% (p=0.000 n=10) Log 160.9n ± 0% 154.6n ± 0% -3.92% (p=0.000 n=10) Logb 49.45n ± 0% 45.70n ± 0% -7.58% (p=0.000 n=10) Log1p 147.1n ± 0% 137.1n ± 0% -6.80% (p=0.000 n=10) Log10 162.1n ± 1% 154.6n ± 0% -4.63% (p=0.000 n=10) Log2 66.99n ± 0% 60.72n ± 0% -9.36% (p=0.000 n=10) Modf 29.42n ± 0% 26.29n ± 0% -10.64% (p=0.000 n=10) Nextafter32 41.95n ± 0% 37.88n ± 0% -9.70% (p=0.000 n=10) Nextafter64 38.82n ± 0% 33.49n ± 0% -13.73% (p=0.000 n=10) PowInt 252.3n ± 0% 237.3n ± 0% -5.95% (p=0.000 n=10) PowFrac 615.5n ± 0% 589.7n ± 0% -4.19% (p=0.000 n=10) Pow10Pos 10.64n ± 0% 10.64n ± 0% ~ (p=1.000 n=10) Pow10Neg 24.42n ± 0% 15.02n ± 0% -38.49% (p=0.000 n=10) Round 21.91n ± 0% 18.16n ± 0% -17.12% (p=0.000 n=10) RoundToEven 24.42n ± 0% 21.29n ± 0% -12.84% (p=0.000 n=10) Remainder 308.0n ± 0% 291.2n ± 0% -5.44% (p=0.000 n=10) Signbit 10.02n ± 0% 10.02n ± 0% ~ (p=1.000 n=10) Sin 102.7n ± 0% 102.7n ± 0% ~ (p=0.211 n=10) Sincos 124.0n ± 1% 123.3n ± 0% -0.56% (p=0.002 n=10) Sinh 239.1n ± 0% 234.7n ± 0% -1.84% (p=0.000 n=10) SqrtIndirect 2.504n ± 0% 2.504n ± 0% ~ (p=0.303 n=10) SqrtLatency 15.03n ± 0% 15.02n ± 0% ~ (p=0.598 n=10) SqrtIndirectLatency 15.02n ± 0% 15.02n ± 0% ~ (p=0.907 n=10) SqrtGoLatency 165.3n ± 0% 157.2n ± 0% -4.90% (p=0.000 n=10) SqrtPrime 3.801µ ± 0% 3.802µ ± 0% ~ (p=1.000 n=10) Tan 125.2n ± 0% 125.2n ± 0% ~ (p=0.458 n=10) Tanh 244.2n ± 0% 239.9n ± 0% -1.76% (p=0.000 n=10) Trunc 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Y0 550.2n ± 0% 538.1n ± 0% -2.21% (p=0.000 n=10) Y1 552.8n ± 0% 540.6n ± 0% -2.21% (p=0.000 n=10) Yn 1.168µ ± 0% 1.143µ ± 0% -2.14% (p=0.000 n=10) Float64bits 8.139n ± 0% 4.385n ± 0% -46.13% (p=0.000 n=10) Float64frombits 7.512n ± 0% 3.759n ± 0% -49.96% (p=0.000 n=10) Float32bits 8.138n ± 0% 9.393n ± 0% +15.42% (p=0.000 n=10) Float32frombits 7.513n ± 0% 3.757n ± 0% -49.98% (p=0.000 n=10) FMA 3.756n ± 0% 3.756n ± 0% ~ (p=0.246 n=10) geomean 77.43n 72.42n -6.47% Change-Id: I8dac69b1d17cb3d2af78d1c844d2b5d80000d667 Reviewed-on: https://go-review.googlesource.com/c/go/+/599235 Reviewed-by: Keith Randall <[email protected]> Auto-Submit: Michael Munday <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Keith Randall <[email protected]>
1 parent 7a1679d commit fcc036f

File tree

6 files changed

+372
-3
lines changed

6 files changed

+372
-3
lines changed

src/cmd/compile/internal/riscv64/ssa.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -417,7 +417,7 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
417417
p.To.Type = obj.TYPE_REG
418418
p.To.Reg = r
419419
case ssa.OpRISCV64FSQRTS, ssa.OpRISCV64FNEGS, ssa.OpRISCV64FABSD, ssa.OpRISCV64FSQRTD, ssa.OpRISCV64FNEGD,
420-
ssa.OpRISCV64FMVSX, ssa.OpRISCV64FMVDX,
420+
ssa.OpRISCV64FMVSX, ssa.OpRISCV64FMVXS, ssa.OpRISCV64FMVDX, ssa.OpRISCV64FMVXD,
421421
ssa.OpRISCV64FCVTSW, ssa.OpRISCV64FCVTSL, ssa.OpRISCV64FCVTWS, ssa.OpRISCV64FCVTLS,
422422
ssa.OpRISCV64FCVTDW, ssa.OpRISCV64FCVTDL, ssa.OpRISCV64FCVTWD, ssa.OpRISCV64FCVTLD, ssa.OpRISCV64FCVTDS, ssa.OpRISCV64FCVTSD,
423423
ssa.OpRISCV64NOT, ssa.OpRISCV64NEG, ssa.OpRISCV64NEGW, ssa.OpRISCV64CLZ, ssa.OpRISCV64CLZW, ssa.OpRISCV64CTZ, ssa.OpRISCV64CTZW,

src/cmd/compile/internal/ssa/_gen/RISCV64.rules

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,11 @@
299299
(base.Op != OpSB || !config.ctxt.Flag_dynlink) =>
300300
(MOV(B|BU|H|HU|W|WU|D)load [off1+off2] {mergeSym(sym1,sym2)} base mem)
301301

302+
(FMOV(W|D)load [off1] {sym1} (MOVaddr [off2] {sym2} base) mem) &&
303+
is32Bit(int64(off1)+int64(off2)) && canMergeSym(sym1, sym2) &&
304+
(base.Op != OpSB || !config.ctxt.Flag_dynlink) =>
305+
(FMOV(W|D)load [off1+off2] {mergeSym(sym1,sym2)} base mem)
306+
302307
(MOV(B|H|W|D)store [off1] {sym1} (MOVaddr [off2] {sym2} base) val mem) &&
303308
is32Bit(int64(off1)+int64(off2)) && canMergeSym(sym1, sym2) &&
304309
(base.Op != OpSB || !config.ctxt.Flag_dynlink) =>
@@ -309,15 +314,26 @@
309314
(base.Op != OpSB || !config.ctxt.Flag_dynlink) =>
310315
(MOV(B|H|W|D)storezero [off1+off2] {mergeSym(sym1,sym2)} base mem)
311316

317+
(FMOV(W|D)store [off1] {sym1} (MOVaddr [off2] {sym2} base) val mem) &&
318+
is32Bit(int64(off1)+int64(off2)) && canMergeSym(sym1, sym2) &&
319+
(base.Op != OpSB || !config.ctxt.Flag_dynlink) =>
320+
(FMOV(W|D)store [off1+off2] {mergeSym(sym1,sym2)} base val mem)
321+
312322
(MOV(B|BU|H|HU|W|WU|D)load [off1] {sym} (ADDI [off2] base) mem) && is32Bit(int64(off1)+off2) =>
313323
(MOV(B|BU|H|HU|W|WU|D)load [off1+int32(off2)] {sym} base mem)
314324

325+
(FMOV(W|D)load [off1] {sym} (ADDI [off2] base) mem) && is32Bit(int64(off1)+off2) =>
326+
(FMOV(W|D)load [off1+int32(off2)] {sym} base mem)
327+
315328
(MOV(B|H|W|D)store [off1] {sym} (ADDI [off2] base) val mem) && is32Bit(int64(off1)+off2) =>
316329
(MOV(B|H|W|D)store [off1+int32(off2)] {sym} base val mem)
317330

318331
(MOV(B|H|W|D)storezero [off1] {sym} (ADDI [off2] base) mem) && is32Bit(int64(off1)+off2) =>
319332
(MOV(B|H|W|D)storezero [off1+int32(off2)] {sym} base mem)
320333

334+
(FMOV(W|D)store [off1] {sym} (ADDI [off2] base) val mem) && is32Bit(int64(off1)+off2) =>
335+
(FMOV(W|D)store [off1+int32(off2)] {sym} base val mem)
336+
321337
// Similarly, fold ADDI into MOVaddr to avoid confusing live variable analysis
322338
// with OffPtr -> ADDI.
323339
(ADDI [c] (MOVaddr [d] {s} x)) && is32Bit(c+int64(d)) => (MOVaddr [int32(c)+d] {s} x)
@@ -701,6 +717,13 @@
701717
(MOVHUreg <t> x:(MOVHload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (MOVHUload <t> [off] {sym} ptr mem)
702718
(MOVWUreg <t> x:(MOVWload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (MOVWUload <t> [off] {sym} ptr mem)
703719

720+
// Replace load from same location as preceding store with copy.
721+
(MOVDload [off] {sym} ptr1 (FMOVDstore [off] {sym} ptr2 x _)) && isSamePtr(ptr1, ptr2) => (FMVXD x)
722+
(FMOVDload [off] {sym} ptr1 (MOVDstore [off] {sym} ptr2 x _)) && isSamePtr(ptr1, ptr2) => (FMVDX x)
723+
(MOVWload [off] {sym} ptr1 (FMOVWstore [off] {sym} ptr2 x _)) && isSamePtr(ptr1, ptr2) => (FMVXS x)
724+
(MOVWUload [off] {sym} ptr1 (FMOVWstore [off] {sym} ptr2 x _)) && isSamePtr(ptr1, ptr2) => (MOVWUreg (FMVXS x))
725+
(FMOVWload [off] {sym} ptr1 (MOVWstore [off] {sym} ptr2 x _)) && isSamePtr(ptr1, ptr2) => (FMVSX x)
726+
704727
// If a register move has only 1 use, just use the same register without emitting instruction
705728
// MOVnop does not emit an instruction, only for ensuring the type.
706729
(MOVDreg x) && x.Uses == 1 => (MOVDnop x)

src/cmd/compile/internal/ssa/_gen/RISCV64Ops.go

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -453,7 +453,8 @@ func init() {
453453
{name: "FNMSUBS", argLength: 3, reg: fp31, asm: "FNMSUBS", commutative: true, typ: "Float32"}, // -(arg0 * arg1) - arg2
454454
{name: "FSQRTS", argLength: 1, reg: fp11, asm: "FSQRTS", typ: "Float32"}, // sqrt(arg0)
455455
{name: "FNEGS", argLength: 1, reg: fp11, asm: "FNEGS", typ: "Float32"}, // -arg0
456-
{name: "FMVSX", argLength: 1, reg: gpfp, asm: "FMVSX", typ: "Float32"}, // reinterpret arg0 as float
456+
{name: "FMVSX", argLength: 1, reg: gpfp, asm: "FMVSX", typ: "Float32"}, // reinterpret arg0 as float32
457+
{name: "FMVXS", argLength: 1, reg: fpgp, asm: "FMVXS", typ: "Int32"}, // reinterpret arg0 as int32, sign extended to 64 bits
457458
{name: "FCVTSW", argLength: 1, reg: gpfp, asm: "FCVTSW", typ: "Float32"}, // float32(low 32 bits of arg0)
458459
{name: "FCVTSL", argLength: 1, reg: gpfp, asm: "FCVTSL", typ: "Float32"}, // float32(arg0)
459460
{name: "FCVTWS", argLength: 1, reg: fpgp, asm: "FCVTWS", typ: "Int32"}, // int32(arg0)
@@ -480,7 +481,8 @@ func init() {
480481
{name: "FNEGD", argLength: 1, reg: fp11, asm: "FNEGD", typ: "Float64"}, // -arg0
481482
{name: "FABSD", argLength: 1, reg: fp11, asm: "FABSD", typ: "Float64"}, // abs(arg0)
482483
{name: "FSGNJD", argLength: 2, reg: fp21, asm: "FSGNJD", typ: "Float64"}, // copy sign of arg1 to arg0
483-
{name: "FMVDX", argLength: 1, reg: gpfp, asm: "FMVDX", typ: "Float64"}, // reinterpret arg0 as float
484+
{name: "FMVDX", argLength: 1, reg: gpfp, asm: "FMVDX", typ: "Float64"}, // reinterpret arg0 as float64
485+
{name: "FMVXD", argLength: 1, reg: fpgp, asm: "FMVXD", typ: "Int64"}, // reinterpret arg0 as int64
484486
{name: "FCVTDW", argLength: 1, reg: gpfp, asm: "FCVTDW", typ: "Float64"}, // float64(low 32 bits of arg0)
485487
{name: "FCVTDL", argLength: 1, reg: gpfp, asm: "FCVTDL", typ: "Float64"}, // float64(arg0)
486488
{name: "FCVTWD", argLength: 1, reg: fpgp, asm: "FCVTWD", typ: "Int32"}, // int32(arg0)

src/cmd/compile/internal/ssa/opGen.go

Lines changed: 28 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)