Skip to content

[AArch64] Correct scheduling information for flag manipulation instructions in Neoverse-V2 #122124

@Asher8118

Description

@Asher8118

Some instructions have incorrect scheduling information when compared to the Neoverse-V2 Software optimisation Guide(link to V2 SWOG: https://developer.arm.com/documentation/109898/latest/) :

Instruction Group AArch64 Instructions Exec Latency Exec Throughput Utilised Pipelines
Flag manipulation instructions SETF8, SETF16,RMIF, CFINV 1 1 F

For example:

rmif
cfinv
setf8 w1
setf16 w1

Running llvm-mca -mtriple=aarch64 -mcpu=neoverse-v2 -instruction-tables on the above instructions gives the following output:

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.17                  U     rmif  #0, #0, #0
 1      1     0.06                  U     cfinv
 1      1     0.17                  U     setf8 w1
 1      1     0.17                  U     setf16        w1


Resources:
[0.0] - V2UnitB
[0.1] - V2UnitB
[1.0] - V2UnitD
[1.1] - V2UnitD
[2]   - V2UnitL2
[3.0] - V2UnitL01
[3.1] - V2UnitL01
[4]   - V2UnitM0
[5]   - V2UnitM1
[6]   - V2UnitS0
[7]   - V2UnitS1
[8]   - V2UnitS2
[9]   - V2UnitS3
[10]  - V2UnitV0
[11]  - V2UnitV1
[12]  - V2UnitV2
[13]  - V2UnitV3


Resource pressure per iteration:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   
 -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -      -      -      -     

Resource pressure by instruction:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   Instructions:
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     rmif     #0, #0, #0
 -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     cfinv
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf8    w1
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf16   w1

The output shows that every instruction has latency 1, throughput 6 and uses pipeline I. This is incorrect and should be fixed in the Neoverse-V2 scheduling model to match the SWOG:

// Flag manipulation instructions
def : WriteRes<WriteSys, []> { let Latency = 1; }

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions