Replies: 1 comment
-
fp8 on a modern 4090 or similar card, is fully utilizing the power of your modern GPU (native fp8 support) gguf on any card is naturally slower as it's a custom kernel algorithm doing a non-native translation between a custom data format into a native format before execution. How much slower it is is situational, and there might be room for improvement in the relevant gguf implementation, but it will always be at least a bit slower. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
is that right that gguf at them moment in combination with a lora is 2times slower than fp8 ?
can that be faster some day or is that the nature of gguf and cuda and gpu ?
Beta Was this translation helpful? Give feedback.
All reactions