-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Model: Minimax M2 #16831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model: Minimax M2 #16831
Conversation
48aab51 to
06ed421
Compare
|
Closes #16798 |
|
|
|
I ran this PR with the q8_0 by DevQuasar and seems to be working. Without It does not print an initial Full command and perplexity results (looks fine) here: https://huggingface.co/DevQuasar/MiniMaxAI.MiniMax-M2-GGUF/discussions/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the vocab files and test, if there is a good reason to test the vocab (which AFAICT there is not) we can add it to ggml-org/vocabs on HF.
|
Tool calls don't work yet? Or is that just this particular GGUF (from bullerwins)? |
|
Done. |
|
Argh, stupid codespaces. @CISC rebased on current master, should be OK now. |
|
@pwilkin Fantastic work and thanks as always for your open source work! |
Is it worth merging if this does not work? |
I think the jinja template works if you just remove |
|
@CISC Yep I normally just remove it for now |
It's weird too, I don't understand why some are using it in their templates as it is default, makes no sense... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to merge when CIs are done.
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
|
@CISC look OK to me, the failures are unrelated (webgpu). |
Seems similar to gpt-oss in this regard, except for all messages and not just tool calls. It should work if clients pass back assistant messages with |
|
guys any idea why the thinking tag still didn't get fixed ? |
|
Once --reasoning-format none is set on the backend, everything should work as the reasoning content will be passed back to the server; the rest is purely cosmetic, like adding a lightweight, dedicated front-end filter or toggle to handle multiple <think>...</think> blocks gracefully. We could take a more modular approach: the backend could properly parse the blocks and send alternating delta reasoning_content / delta content, while a simple front-end option could resend “reasoning_content as content” with a configurable delimiter. It would fit nicely within the OpenAI-Compat layer: though it might be a bit of overengineering... but it would cover all possible cases without needing any additional parsing logic or frontend-side hacks. |
Adding --reasoning-format none still results in missing tink tags. |
|
OK : https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/chat_template.jinja MiniMax-M2 is the first model that actually requires this behavior (the reasoning_content must be preserved in context), so it deserves its own special option. |
|
i dont know why they merged this pr impo this is not good. |
Not everything has to (or should) be done in a single PR. |
|
For anyone interested in enabling tool calls for Minimax M2, refer to PR #16932 — I’ve managed to get tool calls working. |




Implementation for Minimax M2 - not doing the chat template yet because not sure how to handle the interleaving thinking blocks.