Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions packages/tasks/src/hardware.ts
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,14 @@ export const SKUS = {
tflops: 26.5,
memory: [16],
},
"RX 9070 XT": {
tflops: 97.32,
memory: [16],
},
"RX 9070": {
tflops: 72.25,
memory: [16],
},
"RX 7900 XTX": {
tflops: 122.8,
memory: [24],
Expand Down Expand Up @@ -590,6 +598,15 @@ export const SKUS = {
"Ryzen Zen 4 7000 (Threadripper)": {
tflops: 10.0,
},
"Ryzen Zen5 9000 (Ryzen 9)": {
tflops: 0.56,
},
"Ryzen Zen5 9000 (Ryzen 7)": {
tflops: 0.56,
},
"Ryzen Zen5 9000 (Ryzen 5)": {
tflops: 0.56,
},
Comment on lines +601 to +609
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These correspond to the integrated GPU, not the CPU; but I guess it's ok.

Copy link
Contributor Author

@ArkaMukherjee0 ArkaMukherjee0 May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reliable proxy for CPU performance that I can think of if we want to equate them to GPUs. INT8 / INT4 performance would have been a better metric, but then we'd end up with two different quantities. Best to leave CPUs out of the question.

Also, another problem I noticed: all Ryzen 9s are clubbed together, while chips like the Ryzen 9 9900X and the Ryzen 9 9950X3D have large differences in their performance. Here, I tested the cheapest and the costliest Ryzen 9, for example:

DeBERTa inference:

  • Ryzen 9 9900X: 0.7947s
  • Ryzen 9 9950X3D: 0.6674s (+19%)

Granite Vision 3.2 2B 4-bit inference (image+text input):

  • Ryzen 9 9900X: 32.4109s
  • Ryzen 9 9950X3D: 30.5195s (+6.2%)

In pure text input, the 9950X3D did 28.01 tok/s with this model. That's already 1/5th of what the RTX 5060 Ti 16 GB did (137.89 tok/s) in the same test, a card we are currently allocating 42.32x compute power by going for TFLOPS.

Hence, the 0.56 TFLOPS number for all Ryzen 9000 and 7000 series CPUs is indeed not fair or comparable with GPUs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, those tokens/second are probably coming from CPU inference, that's why I think it's interesting to use CPU numbers if we can (it's what people will use for ML inference). The problem is that CPU manufacturers don't usually publish tflops. We can compute them by taking into account the number of cores, the clock frequency and the number of "representative" operations (multiply-add, for example) per clock cycle. The latter, however, is difficult to find and requires diving into tech sheets, see this comment for an example.

I would suggest we merge this PR and revisit the CPU numbers you reported in a follow-up. Personally, I'm happy to include an estimate based on comparative performance if we can't find anything else!

Copy link
Contributor Author

@ArkaMukherjee0 ArkaMukherjee0 May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds good to me. We can visit this in a different conversation. Do let me know the best medium for the exchange.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A draft PR or an issue is good imo! 🤗

"Ryzen Zen4 7000 (Ryzen 9)": {
tflops: 0.56,
},
Expand Down