Skip to content

Commit 95bd878

Browse files
authored
[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) (#170881)
Follow-on from #170324 to also refactor the NEON tests to reuse the input assembly across all Neoverse cores. The approach is as follows: - Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so first combine those. - Inputs for V2/V3/V3AE NEON tests are also already identical, but differ from N-cores, so combine those separately. - Most significantly, input for V1 differs from all other cores primarily because of 24f0901 (#128892). - Split out features that are not supported across all cores. - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Split out FEAT_BF16. V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests. - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: ``` < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 ``` So we take it for all cores. The rest of the diff is instructions in V[2-3] that arent in N cores, so we also take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr. - Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3 - loads/stores are blended - duplicates with different spaces like `shll v0.2d, v0.2s, #32` are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores
1 parent 870aa89 commit 95bd878

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+8764
-8945
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
bfcvt h0, s0
2+
bfcvtn v0.4h, v0.4s
3+
bfcvtn2 v0.8h, v0.4s
4+
bfdot v0.2s, v24.4h, v14.2h[2]
5+
bfdot v0.2s, v0.4h, v0.4h
6+
bfdot v0.4s, v0.8h, v0.8h
7+
bfmlalb v0.4s, v0.8h, v0.8h
8+
bfmlalb v0.4s, v0.8h, v0.h[3]
9+
bfmlalt v0.4s, v0.8h, v0.8h
10+
bfmlalt v0.4s, v0.8h, v0.h[3]
11+
bfmmla v0.4s, v0.8h, v0.8h
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
fcadd v0.2s, v0.2s, v0.2s, 90
2+
fcadd v0.4s, v0.4s, v0.4s, 270
3+
fcmla v0.2s, v0.2s, v0.2s, #90
4+
fcmla v0.4s, v0.4s, v0.s[1], #0
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
fmlal v0.2s, v0.2h, v0.h[1]
2+
fmlal v0.4s, v0.4h, v0.h[3]
3+
fmlal v0.2s, v0.2h, v0.2h
4+
fmlal v0.4s, v0.4h, v0.4h
5+
fmlal2 v0.2s, v0.2h, v0.h[1]
6+
fmlal2 v0.4s, v0.4h, v0.h[3]
7+
fmlal2 v0.2s, v0.2h, v0.2h
8+
fmlal2 v0.4s, v0.4h, v0.4h
9+
fmlsl v0.2s, v0.2h, v0.h[1]
10+
fmlsl v0.4s, v0.4h, v0.h[3]
11+
fmlsl v0.2s, v0.2h, v0.2h
12+
fmlsl v0.4s, v0.4h, v0.4h
13+
fmlsl2 v0.2s, v0.2h, v0.h[1]
14+
fmlsl2 v0.4s, v0.4h, v0.h[3]
15+
fmlsl2 v0.2s, v0.2h, v0.2h
16+
fmlsl2 v0.4s, v0.4h, v0.4h
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
frint32x v0.2d, v0.2d
2+
frint32x v0.2s, v0.2s
3+
frint32x v0.4s, v0.4s
4+
frint32z v0.2d, v0.2d
5+
frint32z v0.2s, v0.2s
6+
frint32z v0.4s, v0.4s
7+
frint64x v0.2d, v0.2d
8+
frint64x v0.2s, v0.2s
9+
frint64x v0.4s, v0.4s
10+
frint64z v0.2d, v0.2d
11+
frint64z v0.2s, v0.2s
12+
frint64z v0.4s, v0.4s
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
smmla v0.4s, v0.16b, v0.16b
2+
sudot v0.2s, v0.8b, v0.4b[2]
3+
sudot v0.4s, v0.16b, v0.4b[2]
4+
ummla v0.4s, v0.16b, v0.16b
5+
usdot v0.2s, v0.8b, v0.4b[2]
6+
usdot v0.2s, v0.8b, v0.8b
7+
usdot v0.4s, v0.16b, v0.16b
8+
usdot v0.4s, v0.16b, v0.4b[2]
9+
usmmla v0.4s, v0.16b, v0.16b

0 commit comments

Comments
 (0)