-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: sanitize unwanted "极速模式" characters from DeepSeek V3.1 responses #7383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,9 +1,11 @@ | ||||||
| import { deepSeekModels, deepSeekDefaultModelId } from "@roo-code/types" | ||||||
| import { Anthropic } from "@anthropic-ai/sdk" | ||||||
|
|
||||||
| import type { ApiHandlerOptions } from "../../shared/api" | ||||||
|
|
||||||
| import type { ApiStreamUsageChunk } from "../transform/stream" | ||||||
| import type { ApiStreamUsageChunk, ApiStream } from "../transform/stream" | ||||||
| import { getModelParams } from "../transform/model-params" | ||||||
| import type { ApiHandlerCreateMessageMetadata } from "../index" | ||||||
|
|
||||||
| import { OpenAiHandler } from "./openai" | ||||||
|
|
||||||
|
|
@@ -26,6 +28,60 @@ export class DeepSeekHandler extends OpenAiHandler { | |||||
| return { id, info, ...params } | ||||||
| } | ||||||
|
|
||||||
| override async *createMessage( | ||||||
| systemPrompt: string, | ||||||
| messages: Anthropic.Messages.MessageParam[], | ||||||
| metadata?: ApiHandlerCreateMessageMetadata, | ||||||
| ): ApiStream { | ||||||
| // Get the stream from the parent class | ||||||
| const stream = super.createMessage(systemPrompt, messages, metadata) | ||||||
|
|
||||||
| // Process each chunk to remove unwanted characters | ||||||
| for await (const chunk of stream) { | ||||||
| if (chunk.type === "text" && chunk.text) { | ||||||
| // Sanitize the text content | ||||||
| chunk.text = this.sanitizeContent(chunk.text) | ||||||
| } else if (chunk.type === "reasoning" && chunk.text) { | ||||||
| // Also sanitize reasoning content | ||||||
| chunk.text = this.sanitizeContent(chunk.text) | ||||||
| } | ||||||
| yield chunk | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| /** | ||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be helpful to add a link to issue #7382 here and explain why these specific characters appear? Is this a known DeepSeek V3.1 bug or a configuration issue? |
||||||
| * Removes unwanted "极速模式" (speed mode) characters from the content. | ||||||
| * These characters appear to be injected by some DeepSeek V3.1 configurations, | ||||||
| * possibly from a Chinese language interface or prompt template. | ||||||
| * The sanitization preserves legitimate Chinese text while removing these artifacts. | ||||||
| */ | ||||||
| private sanitizeContent(content: string): string { | ||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Performance consideration: We're applying 10 regex replacements sequentially on every chunk. For large responses, this could impact performance. Would it make sense to combine some patterns or use a single pass approach? |
||||||
| // First, try to remove the complete phrase "极速模式" | ||||||
| let sanitized = content.replace(/极速模式/g, "") | ||||||
|
|
||||||
| // Remove partial sequences like "模式" that might remain | ||||||
| sanitized = sanitized.replace(/模式(?![一-龿])/g, "") | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The regex on line 63 (/模式(?![一-龿])/g) removes all occurrences of '模式' at the end of a string—even if part of a legitimate phrase (e.g. '常规模式'). Consider adding a negative lookbehind (similar to the other patterns) so that valid Chinese words aren’t unintentionally truncated.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with ellipsis-dev bot here - this pattern will incorrectly remove '模式' from legitimate Chinese phrases. Should we add a negative lookbehind like the other patterns? |
||||||
|
|
||||||
| // Remove isolated occurrences of these characters when they appear | ||||||
| // between non-Chinese characters or at boundaries | ||||||
| // Using more specific patterns to avoid removing legitimate Chinese text | ||||||
| sanitized = sanitized.replace(/(?<![一-龿])极(?![一-龿])/g, "") | ||||||
| sanitized = sanitized.replace(/(?<![一-龿])速(?![一-龿])/g, "") | ||||||
| sanitized = sanitized.replace(/(?<![一-龿])模(?![一-龿])/g, "") | ||||||
| sanitized = sanitized.replace(/(?<![一-龿])式(?![一-龿])/g, "") | ||||||
|
|
||||||
| // Handle cases where these characters appear with spaces | ||||||
| sanitized = sanitized.replace(/\s+极\s*/g, " ") | ||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These space-based patterns might be too aggressive. They'll remove legitimate Chinese words when preceded by a space. For example, "这是 极好的" (This is excellent) would become "这是 好的". Could we make these patterns more specific to only target the artifacts? |
||||||
| sanitized = sanitized.replace(/\s+速\s*/g, " ") | ||||||
| sanitized = sanitized.replace(/\s+模\s*/g, " ") | ||||||
| sanitized = sanitized.replace(/\s+式\s*/g, " ") | ||||||
|
|
||||||
| // Clean up any resulting multiple spaces | ||||||
| sanitized = sanitized.replace(/\s+/g, " ").trim() | ||||||
|
|
||||||
| return sanitized | ||||||
| } | ||||||
|
|
||||||
| // Override to handle DeepSeek's usage metrics, including caching. | ||||||
| protected override processUsageMetrics(usage: any): ApiStreamUsageChunk { | ||||||
| return { | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding more edge case tests: