Skip to content

Conversation

@FLZ101
Copy link
Contributor

@FLZ101 FLZ101 commented Mar 16, 2025

Refer to comments in the source code for how this tool is used and how it works.

Below is an example output:

def N3UnitB : ProcResource<2>; // Branch 0/1
def N3UnitS : ProcResource<2>; // Integer Single-Cycle 0/1
def N3UnitM0 : ProcResource<1>; // Integer Single/Multi-Cycle 0
def N3UnitIntegerSingleMultiCycle1 : ProcResource<1>; // Integer Single/Multi-Cycle 1
def N3UnitV0 : ProcResource<1>; // FP/ASIMD/Vector Store data 0
def N3UnitV1 : ProcResource<1>; // FP/ASIMD/Vector Store data 1
def N3UnitL01 : ProcResource<2>; // Load/Store 0/1
def N3UnitLoad2 : ProcResource<1>; // Load 2
def N3UnitID : ProcResource<2>; // Integer Store data 0/1

def N3UnitM : ProcResGroup<[N3UnitM0, N3UnitIntegerSingleMultiCycle1]>;
def N3UnitI : ProcResGroup<[N3UnitS, N3UnitM0, N3UnitIntegerSingleMultiCycle1]>;
def N3UnitL : ProcResGroup<[N3UnitL01, N3UnitLoad2]>;
def N3UnitV : ProcResGroup<[N3UnitV0, N3UnitV1]>;

...

def N3Write_1c_1B : SchedWriteRes<[N3UnitB]> {
    let Latency = 1;
}

...

// Branch, immed
def : InstRW<[N3Write_1c_1B], (instrs B)>;

...

Utilizing this tool rather than writing a scheduling model from scratch should save some efforts.

@FLZ101
Copy link
Contributor Author

FLZ101 commented Mar 16, 2025

@davemgreen

@sjoerdmeijer
Copy link
Collaborator

I am very much interested in autogenerating (most of the) schedmodels.

I haven't looked too deeply into this change and its output, but I think it would be good to first have a discussion on how complete and useful this is. I.e., for this discussion, can we compare the output of the tool to a well established and existing scheduling model (e.g. the Neoverse V2 that I am most familiar with)? So, I think we need to first establish how useful this is, what is lacking, and what the plan is to address them (if any).

I came across this old review for a X86 CPU: https://reviews.llvm.org/D130897. I have also not yet studied that in detail, but there seems to be a lot more going on in that patch.

@FLZ101
Copy link
Contributor Author

FLZ101 commented Mar 18, 2025

I came across this old review for a X86 CPU: https://reviews.llvm.org/D130897. I have also not yet studied that in detail, but there seems to be a lot more going on in that patch.

The tool in this PR is much simpler since it only generates a draft rather than a working sched model. It does that based on a simple rule: for each row in the instruction tables, match the throughput assuming all utilized units are fully utilized.

Take the following row in the Neoverse N3 instruction tables as an example:

Instruction Group           Instructions  Latency  Throughput  Pipelines
Branch and link, register   BLR           1        2           B, S

The throughput is 2, that means 2 instructions are executed in a cycle. The pipeline B and C each has 2 units, that means 2 B uops and 2 S uops are executed in a cycle. So each instruction has 1 B uop and 1 S uop.

For any instructions the above rule does not applies to, we need to manually modify their descriptions.

It does not map instruction names in SWOG to names in LLVM.

It does not define any forwarding rules.

@davemgreen
Copy link
Collaborator

Nice tool. I'm impressed that it manages to parse the tables as cleanly as it does.

It does not map instruction names in SWOG to names in LLVM.

That sounds like it might be the hard bit, at least it looks like it would require some manual effort. There has always been the question of which is better - trying to collect the data from a known good source or trying to measure it directly on real hardware. Both have advantages and disadvantages, and in the end of the day come from the same source (the SWOG's just have someone who knows what the right answer should be looking over the results after they are measured).

From looking at https://reviews.llvm.org/D144388#4149183, and what I've seen of the difficulties in measuring some values reliably, perhaps we will end up needing a mixture of both approaches, with one checking the results of the other.

@rj-jesus
Copy link
Contributor

It does not map instruction names in SWOG to names in LLVM.

That sounds like it might be the hard bit, at least it looks like it would require some manual effort.

We have a utility tool that takes an instruction name and maps it to LLVM opcodes. For example, given ADD (vectors, unpredicated) it produces:

  ADD_ZZZ_B
  ADD_ZZZ_H
  ADD_ZZZ_S
  ADD_ZZZ_D

Maybe it could be useful to extend this script?

@FLZ101
Copy link
Contributor Author

FLZ101 commented Mar 20, 2025

We have a utility tool that takes an instruction name and maps it to LLVM opcodes. For example, given ADD (vectors, unpredicated) it produces:

  ADD_ZZZ_B
  ADD_ZZZ_H
  ADD_ZZZ_S
  ADD_ZZZ_D

Maybe it could be useful to extend this script?

If the utility can get LLVM opcodes from (instruction group, instruction name) (e.g. ("Predicate logical", "AND")), that would be great.

@fioraking
Copy link

Maybe the file name should be changed to arm_sched_model_gen_from_swog.py. Otherwise, it might be mistaken as a general-purpose one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants