Open
Description
Some instructions have incorrect scheduling information when compared to the Neoverse-V2 Software optimisation Guide(link to V2 SWOG: https://developer.arm.com/documentation/109898/latest/) :
Instruction Group | AArch64 Instructions | Exec Latency | Exec Throughput | Utilised Pipelines |
---|---|---|---|---|
Flag manipulation instructions | SETF8, SETF16,RMIF, CFINV | 1 | 1 | F |
For example:
rmif
cfinv
setf8 w1
setf16 w1
Running llvm-mca -mtriple=aarch64 -mcpu=neoverse-v2 -instruction-tables
on the above instructions gives the following output:
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[1] [2] [3] [4] [5] [6] Instructions:
1 1 0.17 U rmif #0, #0, #0
1 1 0.06 U cfinv
1 1 0.17 U setf8 w1
1 1 0.17 U setf16 w1
Resources:
[0.0] - V2UnitB
[0.1] - V2UnitB
[1.0] - V2UnitD
[1.1] - V2UnitD
[2] - V2UnitL2
[3.0] - V2UnitL01
[3.1] - V2UnitL01
[4] - V2UnitM0
[5] - V2UnitM1
[6] - V2UnitS0
[7] - V2UnitS1
[8] - V2UnitS2
[9] - V2UnitS3
[10] - V2UnitV0
[11] - V2UnitV1
[12] - V2UnitV2
[13] - V2UnitV3
Resource pressure per iteration:
[0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
- - - - - - - 0.50 0.50 0.50 0.50 0.50 0.50 - - - -
Resource pressure by instruction:
[0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
- - - - - - - 0.17 0.17 0.17 0.17 0.17 0.17 - - - - rmif #0, #0, #0
- - - - - - - - - - - - - - - - - cfinv
- - - - - - - 0.17 0.17 0.17 0.17 0.17 0.17 - - - - setf8 w1
- - - - - - - 0.17 0.17 0.17 0.17 0.17 0.17 - - - - setf16 w1
The output shows that every instruction has latency 1, throughput 6 and uses pipeline I. This is incorrect and should be fixed in the Neoverse-V2 scheduling model to match the SWOG:
llvm-project/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
Lines 1139 to 1140 in f37bee1