llama.cpp supports the new gpt-oss model in native MXFP4 format #15095
Replies: 11 comments 1 reply
-
YESSSSSSS :D |
Beta Was this translation helpful? Give feedback.
-
It's just amazing to see how far the project has come. Thanks to everyone that makes this possible! |
Beta Was this translation helpful? Give feedback.
-
Great work! |
Beta Was this translation helpful? Give feedback.
-
What's the advantage of MXFP4 Quant? Does it has better performance than Q4_K_M or IQ4_XS Quant? |
Beta Was this translation helpful? Give feedback.
-
This is incredible :D! I thank you to all the developers involved |
Beta Was this translation helpful? Give feedback.
-
Great achievement! Many thanks to all involved. |
Beta Was this translation helpful? Give feedback.
-
Incredibly awesome work!! Thanks to all the people making this possible. :) |
Beta Was this translation helpful? Give feedback.
-
Yessss! Awesome work 😍 |
Beta Was this translation helpful? Give feedback.
-
Super, is it the release b6101 (https://github.com/ggml-org/llama.cpp/releases/tag/b6101) ? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the great job |
Beta Was this translation helpful? Give feedback.
-
That is amazing! You are the best! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The new
gpt-oss
model is fully supported in native MXFP4 format across all majorggml
backends, including CUDA, Vulkan, Metal and CPU at exceptional performance. This virtually brings the unprecedented quality of gpt-oss in the hands of everyone - from local AI enthusiasts to enterprises doing inference at the edge or in the cloud. The unique inference capabilities of ggml unlock a vast amount of use cases for the entire spectrum of consumer-grade hardware available on the market today - use cases that are impossible to support with any other inference framework in existence. Today, gpt-oss trained with the MXFP4 format, effectively “leaps” over the existing resource barriers and allows us to experience SOTA AI quality on our own personal devices.The era of natively trained 4-bit local models has officially began and ggml will continue to lead the way forward!
Over the past 2 years the open-source developer community behind ggml has significantly grown. Together we built a scalable software infrastructure capable of supporting all of the needs of modern, low-level ML inference. More and more engineers, product builders, hardware vendors and researchers continue to discover and adopt what we have created. All of this wouldn’t have been possible without the dedicated open approach that this community has embraced from the very beginning of the project. The main difference today, compared to 2 years ago is that we are past the “hacking” phase of the development and it is time to focus on architecting and maintaining the correct implementation that will become the foundation of most local AI applications and products in the near future.
The primary goal of ggml-org will continue to be to help the community grow and create opportunities for everyone involved. We are more than ever open for support from the leaders in the AI field and today’s release is a prime example of what is possible with such coordinated and aligned efforts.
Special thanks to all maintainers, collaborators and contributors of ggml and related projects. Looking forward to many new developments together. Have fun!
Beta Was this translation helpful? Give feedback.
All reactions