fix: parse gpt-oss special token format in LM Studio responses #6740

roomote · 2025-08-05T20:41:24Z

This PR fixes the issue where gpt-oss models output responses in a special token format that was being displayed raw instead of being parsed properly.

Problem

When using LM Studio with gpt-oss-20b model, responses were appearing in this format:

<|start|>assistant<|channel|>commentary to=read_file <|constrain|>json<|message|>{"args":[{"file":{"path":"documentation/program_analysis.md"}}]}

Solution

Added detection for gpt-oss models in the LmStudioHandler
Implemented a parseGptOssFormat method that:
- Extracts content after the <|message|> token if present
- Otherwise removes all special tokens and function patterns
- Returns clean, usable content
Added comprehensive tests to ensure the parsing works correctly

Testing

Added 3 new test cases covering different scenarios
All existing tests continue to pass
The parsing only applies to models with "gpt-oss" in their name, so other models are unaffected

Fixes #6739

Important

Fixes gpt-oss model response parsing in LM Studio by adding special token format handling in LmStudioHandler.

Behavior:
- LmStudioHandler now detects gpt-oss models and parses special token formats using parseGptOssFormat().
- Extracts content after <|message|> token or removes special tokens and function patterns if not present.
- Parsing only applies to models with "gpt-oss" in their name.
Testing:
- Added tests in lmstudio.spec.ts for gpt-oss format parsing with and without <|message|> token.
- Ensures non-gpt-oss models are unaffected by parsing.
Misc:
- Fixes issue Strange formatted replies when using with LMStudio and gpt-oss #6739.

^{This description was created by}^{for 93dc32d. You can customize this summary. It will automatically update as commits are pushed.}

- Add detection for gpt-oss models in LmStudioHandler - Implement parseGptOssFormat method to extract actual content from special token format - Add comprehensive tests for the new parsing logic - Fixes #6739

roomote

Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.

roomote · 2025-08-05T20:45:46Z

src/api/providers/lm-studio.ts

+	private parseGptOssFormat(content: string): string {
+		// Remove all special tokens and extract the actual message
+		// Pattern: <|token|> where token can be any word
+		const specialTokenPattern = /<\|[^|]+\|>/g


The regex pattern /<\|[^|]+\|>/g might not handle edge cases correctly. What happens if the token content itself contains a pipe character? Consider using a more robust parsing approach or documenting this limitation.

roomote · 2025-08-05T20:45:46Z

src/api/providers/lm-studio.ts

-					for (const processedChunk of matcher.update(delta.content)) {
-						yield processedChunk
+					// Check if this is a gpt-oss model with special token format
+					const isGptOss = this.getModel().id?.toLowerCase().includes("gpt-oss")


Performance consideration: Since the model doesn't change during streaming, could we move this gpt-oss check outside the loop to avoid repeated string operations on every chunk?

Suggested change

const isGptOss = this.getModel().id?.toLowerCase().includes("gpt-oss")

// Check if this is a gpt-oss model with special token format

const isGptOss = this.getModel().id?.toLowerCase().includes("gpt-oss")

for await (const chunk of results) {

const delta = chunk.choices[0]?.delta

if (delta?.content) {

if (isGptOss && delta.content.includes("<|") && delta.content.includes("|>")) {

roomote · 2025-08-05T20:45:46Z

src/api/providers/lm-studio.ts

+						// Parse gpt-oss special token format
+						// Format: <|start|>assistant<|channel|>commentary to=read_file <|constrain|>json<|message|>{"args":[...]}
+						const cleanedContent = this.parseGptOssFormat(delta.content)
+						if (cleanedContent) {


When cleanedContent is empty or falsy after parsing, we silently skip it. Should we consider logging a warning to help with debugging unexpected formats?

roomote · 2025-08-05T20:45:46Z

src/api/providers/__tests__/lmstudio.spec.ts

 		})
 	})
+
+	describe("gpt-oss special token parsing", () => {


Consider adding test cases for edge scenarios:

Multiple <|message|> tokens in a single chunk

Malformed special tokens (e.g., unclosed tokens like <|start)

Very large JSON payloads after the message token

Mixed content with both special tokens and regular text

roomote · 2025-08-05T20:45:46Z

src/api/providers/lm-studio.ts

+		const cleaned = content.replace(specialTokenPattern, " ").trim()
+
+		// Also clean up any "to=function_name" patterns that might remain
+		const functionPattern = /\s*to=\w+\s*/g


The function pattern /\s*to=\w+\s*/g only matches word characters. Is this intentional? Function names might contain hyphens or underscores. Consider using /\s*to=[\w-]+\s*/g if you want to support kebab-case function names.

daniel-lxs · 2025-08-05T21:05:18Z

The model is hallucinating a format and looping, closing for now

fix: parse gpt-oss special token format in LM Studio responses

93dc32d

- Add detection for gpt-oss models in LmStudioHandler - Implement parseGptOssFormat method to extract actual content from special token format - Add comprehensive tests for the new parsing logic - Fixes #6739

roomote bot requested review from cte, jr and mrubens as code owners August 5, 2025 20:41

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Aug 5, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Aug 5, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Aug 5, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Aug 5, 2025

roomote bot commented Aug 5, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 5, 2025

roomote bot mentioned this pull request Aug 5, 2025

Strange formatted replies when using with LMStudio and gpt-oss #6739

Closed

daniel-lxs closed this Aug 5, 2025

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 5, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: parse gpt-oss special token format in LM Studio responses #6740

fix: parse gpt-oss special token format in LM Studio responses #6740

Uh oh!

roomote bot commented Aug 5, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

roomote bot left a comment

Uh oh!

roomote bot Aug 5, 2025

Uh oh!

roomote bot Aug 5, 2025

Uh oh!

roomote bot Aug 5, 2025

Uh oh!

roomote bot Aug 5, 2025

Uh oh!

roomote bot Aug 5, 2025

Uh oh!

daniel-lxs commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-					const isGptOss = this.getModel().id?.toLowerCase().includes("gpt-oss")
+			// Check if this is a gpt-oss model with special token format
+			const isGptOss = this.getModel().id?.toLowerCase().includes("gpt-oss")
+			for await (const chunk of results) {
+				const delta = chunk.choices[0]?.delta
+				if (delta?.content) {
+					if (isGptOss && delta.content.includes("<|") && delta.content.includes("|>")) {

fix: parse gpt-oss special token format in LM Studio responses #6740

fix: parse gpt-oss special token format in LM Studio responses #6740

Uh oh!

Conversation

roomote bot commented Aug 5, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

daniel-lxs commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roomote bot commented Aug 5, 2025 •

edited by ellipsis-dev bot

Loading