Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 28, 2025

This PR addresses Issue #7500 by adding a configuration option to prevent duplicate BOS tokens when using DeepSeek V3.1 with llama.cpp.

Problem

When using DeepSeek V3.1 through OpenAI Compatible provider with llama.cpp (with --jinja flag enabled), users were getting a warning about duplicate BOS tokens. This happens because llama.cpp automatically adds BOS tokens, but Roo Code was also sending messages in a format that triggered another BOS token addition.

Solution

Added a new configuration option openAiSkipSystemMessage that, when enabled for DeepSeek models:

  • Merges the system prompt into the first user message instead of sending it as a separate system message
  • Prevents the duplicate BOS token issue
  • Maintains backward compatibility as an opt-in feature

Changes

  • Added openAiSkipSystemMessage boolean option to OpenAI provider settings schema
  • Updated OpenAI handler to detect DeepSeek models and apply the skip logic when configured
  • Handles both streaming and non-streaming modes consistently
  • Added comprehensive test coverage for the new functionality

Testing

  • ✅ All new tests pass (10 test cases covering various scenarios)
  • ✅ All existing OpenAI provider tests pass (no regression)
  • ✅ All existing DeepSeek provider tests pass
  • ✅ Linting and type checking pass

Usage

Users experiencing the duplicate BOS token issue with DeepSeek V3.1 and llama.cpp can enable the openAiSkipSystemMessage option in their OpenAI Compatible provider configuration.

Fixes #7500


Important

Adds openAiSkipSystemMessage option to prevent duplicate BOS tokens with DeepSeek V3.1 in openai.ts, with comprehensive tests.

  • Behavior:
    • Adds openAiSkipSystemMessage option to prevent duplicate BOS tokens with DeepSeek V3.1 in openai.ts.
    • Merges system prompt into the first user message when enabled, avoiding separate system messages.
    • Applies to both streaming and non-streaming modes.
  • Configuration:
    • Updates openAiSchema in provider-settings.ts to include openAiSkipSystemMessage.
  • Testing:
    • Adds openai-deepseek-bos.spec.ts with 10 test cases for various scenarios, ensuring correct behavior with and without the new option.
  • Misc:
    • Updates OpenAiHandler in openai.ts to detect DeepSeek models and apply skip logic when configured.

This description was created by Ellipsis for bd283b7. You can customize this summary. It will automatically update as commits are pushed.

…ns with DeepSeek V3.1

- Added openAiSkipSystemMessage configuration option for OpenAI Compatible providers
- When enabled for DeepSeek models, merges system prompt into first user message
- Prevents duplicate BOS tokens when using llama.cpp with --jinja flag
- Added comprehensive tests for the new functionality

Fixes #7500
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 28, 2025 17:41
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Aug 28, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code is like debugging in a mirror - everything looks backward but the bugs are still mine.


if (deepseekReasoner) {
convertedMessages = convertToR1Format([{ role: "user", content: systemPrompt }, ...messages])
} else if (skipSystemMessage) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice there's duplicate logic here between streaming (lines 108-129) and non-streaming (lines 248-268) modes. Could we extract this into a helper method like prepareMessagesWithSkipSystemMessage() to reduce duplication and improve maintainability?

// Check if we should skip system message for DeepSeek V3 models with llama.cpp
const skipSystemMessage =
this.options.openAiSkipSystemMessage &&
(modelId.toLowerCase().includes("deepseek") || modelId.toLowerCase().includes("deepseek-v3"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model detection using includes("deepseek") might be too broad and could match unintended models. Would it be more robust to use a specific list of model IDs or a regex pattern?

openAiStreamingEnabled: z.boolean().optional(),
openAiHostHeader: z.string().optional(), // Keep temporarily for backward compatibility during migration.
openAiHeaders: z.record(z.string(), z.string()).optional(),
openAiSkipSystemMessage: z.boolean().optional(), // Skip system message for models that auto-add BOS tokens (e.g., llama.cpp with --jinja)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is helpful, but could we expand it to explain when users should enable this option? For example: 'Enable this if you see duplicate BOS token warnings with DeepSeek V3.1 and llama.cpp'


vi.mock("openai")

describe("OpenAI Handler - DeepSeek V3 BOS Token Handling", () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage! Consider adding a few edge cases:

  • What happens when the system prompt is empty?
  • Behavior with complex message content (arrays with multiple text/image parts)?
  • Interaction with R1 format when both openAiR1FormatEnabled and openAiSkipSystemMessage are true?

let convertedMessages

// Check if we should skip system message for DeepSeek V3 models with llama.cpp
const skipSystemMessage =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding this for DeepSeek, could this feature be useful for other llama.cpp deployments? Consider renaming the option to something more generic like mergeSystemIntoFirstUser to indicate the behavior rather than the specific use case.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 28, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 29, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 29, 2025
@daniel-lxs
Copy link
Member

Closing this PR as the approach has fundamental limitations. The core issue is that we cannot reliably detect if llama.cpp is being used at runtime - we can only guess based on model names, which is not a sustainable solution. Merging system messages into user messages also changes the semantic structure of the conversation in ways that could affect model behavior.

@daniel-lxs daniel-lxs closed this Sep 11, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Sep 11, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working PR - Needs Preliminary Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Getting "final prompt starts with 2 BOS tokens" warning with DeepSeek V3.1

4 participants