speed - gguf vs fp8 models #391

kalle07 · 2024-11-03T19:08:53Z

kalle07
Nov 3, 2024

is that right that gguf at them moment in combination with a lora is 2times slower than fp8 ?
can that be faster some day or is that the nature of gguf and cuda and gpu ?

mcmonkey4eva · 2024-11-03T19:15:31Z

mcmonkey4eva
Nov 3, 2024
Maintainer

fp8 on a modern 4090 or similar card, is fully utilizing the power of your modern GPU (native fp8 support)
fp8 on an older generation card cannot utilize the full power of the GPU but can at least do decently (30xx at least has the ability to very quick upcast from fp8)

gguf on any card is naturally slower as it's a custom kernel algorithm doing a non-native translation between a custom data format into a native format before execution. How much slower it is is situational, and there might be room for improvement in the relevant gguf implementation, but it will always be at least a bit slower.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

speed - gguf vs fp8 models #391

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

speed - gguf vs fp8 models #391

Uh oh!

kalle07 Nov 3, 2024

Replies: 1 comment

Uh oh!

mcmonkey4eva Nov 3, 2024 Maintainer

kalle07
Nov 3, 2024

mcmonkey4eva
Nov 3, 2024
Maintainer