-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: prevent duplicate BOS tokens with DeepSeek V3.1 in llama.cpp #7501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ns with DeepSeek V3.1 - Added openAiSkipSystemMessage configuration option for OpenAI Compatible providers - When enabled for DeepSeek models, merges system prompt into first user message - Prevents duplicate BOS tokens when using llama.cpp with --jinja flag - Added comprehensive tests for the new functionality Fixes #7500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backward but the bugs are still mine.
|
|
||
| if (deepseekReasoner) { | ||
| convertedMessages = convertToR1Format([{ role: "user", content: systemPrompt }, ...messages]) | ||
| } else if (skipSystemMessage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice there's duplicate logic here between streaming (lines 108-129) and non-streaming (lines 248-268) modes. Could we extract this into a helper method like prepareMessagesWithSkipSystemMessage() to reduce duplication and improve maintainability?
| // Check if we should skip system message for DeepSeek V3 models with llama.cpp | ||
| const skipSystemMessage = | ||
| this.options.openAiSkipSystemMessage && | ||
| (modelId.toLowerCase().includes("deepseek") || modelId.toLowerCase().includes("deepseek-v3")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model detection using includes("deepseek") might be too broad and could match unintended models. Would it be more robust to use a specific list of model IDs or a regex pattern?
| openAiStreamingEnabled: z.boolean().optional(), | ||
| openAiHostHeader: z.string().optional(), // Keep temporarily for backward compatibility during migration. | ||
| openAiHeaders: z.record(z.string(), z.string()).optional(), | ||
| openAiSkipSystemMessage: z.boolean().optional(), // Skip system message for models that auto-add BOS tokens (e.g., llama.cpp with --jinja) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is helpful, but could we expand it to explain when users should enable this option? For example: 'Enable this if you see duplicate BOS token warnings with DeepSeek V3.1 and llama.cpp'
|
|
||
| vi.mock("openai") | ||
|
|
||
| describe("OpenAI Handler - DeepSeek V3 BOS Token Handling", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Consider adding a few edge cases:
- What happens when the system prompt is empty?
- Behavior with complex message content (arrays with multiple text/image parts)?
- Interaction with R1 format when both openAiR1FormatEnabled and openAiSkipSystemMessage are true?
| let convertedMessages | ||
|
|
||
| // Check if we should skip system message for DeepSeek V3 models with llama.cpp | ||
| const skipSystemMessage = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding this for DeepSeek, could this feature be useful for other llama.cpp deployments? Consider renaming the option to something more generic like mergeSystemIntoFirstUser to indicate the behavior rather than the specific use case.
|
Closing this PR as the approach has fundamental limitations. The core issue is that we cannot reliably detect if llama.cpp is being used at runtime - we can only guess based on model names, which is not a sustainable solution. Merging system messages into user messages also changes the semantic structure of the conversation in ways that could affect model behavior. |
This PR addresses Issue #7500 by adding a configuration option to prevent duplicate BOS tokens when using DeepSeek V3.1 with llama.cpp.
Problem
When using DeepSeek V3.1 through OpenAI Compatible provider with llama.cpp (with
--jinjaflag enabled), users were getting a warning about duplicate BOS tokens. This happens because llama.cpp automatically adds BOS tokens, but Roo Code was also sending messages in a format that triggered another BOS token addition.Solution
Added a new configuration option
openAiSkipSystemMessagethat, when enabled for DeepSeek models:Changes
openAiSkipSystemMessageboolean option to OpenAI provider settings schemaTesting
Usage
Users experiencing the duplicate BOS token issue with DeepSeek V3.1 and llama.cpp can enable the
openAiSkipSystemMessageoption in their OpenAI Compatible provider configuration.Fixes #7500
Important
Adds
openAiSkipSystemMessageoption to prevent duplicate BOS tokens with DeepSeek V3.1 inopenai.ts, with comprehensive tests.openAiSkipSystemMessageoption to prevent duplicate BOS tokens with DeepSeek V3.1 inopenai.ts.openAiSchemainprovider-settings.tsto includeopenAiSkipSystemMessage.openai-deepseek-bos.spec.tswith 10 test cases for various scenarios, ensuring correct behavior with and without the new option.OpenAiHandlerinopenai.tsto detect DeepSeek models and apply skip logic when configured.This description was created by
for bd283b7. You can customize this summary. It will automatically update as commits are pushed.