You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
server: add minimax-m2 reasoning format override for MiniMax-M2 compatibility
MiniMax-M2 models require the complete <think>...</think> block including tags
to be present in the context for proper reasoning. This mode injects a synthetic
opening <think> tag in the stream while keeping all reasoning tags inline in
message.content, ensuring the model receives the full reasoning block it needs.
Changes:
- Add COMMON_REASONING_FORMAT_MINIMAX_M2 enum value to common_reasoning_format
- Implement minimax-m2 format parsing that bypasses reasoning extraction
- Inject synthetic <think>\n chunk at slot start when minimax-m2 is active
- Track injection state with minimax_reasoning_prefix_injected slot flag
- Prepend <think>\n to generated_text for final response and chat parsing
- Prevent client reasoning_format=auto from overriding server CLI setting
- Add minimax-m2 to CLI help, README.md, and code documentation
- Handle LLAMA_TOKEN_NULL in send_partial_response to skip token recording
- Update process_token to preserve delta_to_send for streaming correctness
COMMON_REASONING_FORMAT_AUTO, // Same as deepseek, using `message.reasoning_content`
250
250
COMMON_REASONING_FORMAT_DEEPSEEK_LEGACY, // Extract thinking tag contents and return as `message.reasoning_content`, or leave inline in <think> tags in stream mode
251
251
COMMON_REASONING_FORMAT_DEEPSEEK, // Extract thinking tag contents and return as `message.reasoning_content`, including in streaming deltas.
252
+
COMMON_REASONING_FORMAT_MINIMAX_M2, // Stream a synthetic opening <think> tag and keep </think> tags in `message.content` for MiniMax-M2 compatibility
252
253
// do not extend this enum unless you absolutely have to
253
254
// in most cases, use COMMON_REASONING_FORMAT_AUTO
|`--slot-save-path PATH`| path to save slot kv cache (default: disabled) |
192
192
|`--jinja`| use jinja template for chat (default: disabled)<br/>(env: LLAMA_ARG_JINJA) |
193
-
|`--reasoning-format FORMAT`| controls whether thought tags are allowed and/or extracted from the response, and in which format they're returned; one of:<br/>- none: leaves thoughts unparsed in `message.content`<br/>- deepseek: puts thoughts in `message.reasoning_content`<br/>- deepseek-legacy: keeps `<think>` tags in `message.content` while also populating `message.reasoning_content`<br/>(default: deepseek)<br/>(env: LLAMA_ARG_THINK) |
193
+
|`--reasoning-format FORMAT`| controls whether thought tags are allowed and/or extracted from the response, and in which format they're returned; one of:<br/>- none: leaves thoughts unparsed in `message.content`<br/>- deepseek: puts thoughts in `message.reasoning_content`<br/>- deepseek-legacy: keeps `<think>` tags in `message.content` while also populating `message.reasoning_content`<br/>- minimax-m2: Stream a synthetic opening <think> tag and keep </think> tags in `message.content` for MiniMax-M2 compatibility<br/>(default: deepseek)<br/>(env: LLAMA_ARG_THINK) |
194
194
|`--reasoning-budget N`| controls the amount of thinking allowed; currently only one of: -1 for unrestricted thinking budget, or 0 to disable thinking (default: -1)<br/>(env: LLAMA_ARG_THINK_BUDGET) |
195
195
|`--chat-template JINJA_TEMPLATE`| set custom jinja chat template (default: template taken from model's metadata)<br/>if suffix/prefix are specified, template will be disabled<br/>only commonly used templates are accepted (unless --jinja is set before this flag):<br/>list of built-in templates:<br/>bailing, chatglm3, chatglm4, chatml, command-r, deepseek, deepseek2, deepseek3, exaone3, exaone4, falcon3, gemma, gigachat, glmedge, gpt-oss, granite, hunyuan-dense, hunyuan-moe, kimi-k2, llama2, llama2-sys, llama2-sys-bos, llama2-sys-strip, llama3, llama4, megrez, minicpm, mistral-v1, mistral-v3, mistral-v3-tekken, mistral-v7, mistral-v7-tekken, monarch, openchat, orion, phi3, phi4, rwkv-world, seed_oss, smolvlm, vicuna, vicuna-orca, yandex, zephyr<br/>(env: LLAMA_ARG_CHAT_TEMPLATE) |
196
196
|`--chat-template-file JINJA_TEMPLATE_FILE`| set custom jinja chat template file (default: template taken from model's metadata)<br/>if suffix/prefix are specified, template will be disabled<br/>only commonly used templates are accepted (unless --jinja is set before this flag):<br/>list of built-in templates:<br/>bailing, chatglm3, chatglm4, chatml, command-r, deepseek, deepseek2, deepseek3, exaone3, exaone4, falcon3, gemma, gigachat, glmedge, gpt-oss, granite, hunyuan-dense, hunyuan-moe, kimi-k2, llama2, llama2-sys, llama2-sys-bos, llama2-sys-strip, llama3, llama4, megrez, minicpm, mistral-v1, mistral-v3, mistral-v3-tekken, mistral-v7, mistral-v7-tekken, monarch, openchat, orion, phi3, phi4, rwkv-world, seed_oss, smolvlm, vicuna, vicuna-orca, yandex, zephyr<br/>(env: LLAMA_ARG_CHAT_TEMPLATE_FILE) |
0 commit comments