Replies: 1 comment
-
NF4 is performance optimizations, while GGUF is compression. The better method would be to first make an NF4 model, then turn it into a GGUF model, and have a system that recognizes how to handle both methods when used together. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m genuinely not certain if it’s even possible, so I’m being literal when I ask: Could a NF4 model be made from a GGUF model? Would there be any benefit to it? Would it further decrease model size and VRAM requirements while still retaining enough information to be used effectively?
Beta Was this translation helpful? Give feedback.
All reactions