Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 25, 2025

Summary

This PR fixes the LM Studio crash issue when the context reaches 128K tokens despite having a 200K limit configured.

Problem

Users were experiencing crashes with LM Studio when the context reached 128K tokens, even though they had configured a 200K context limit. The error message was:

The model has crashed without additional information. (Exit code: 18446744072635812000)

Root Cause

The issue was in the parseLMStudioModel function where maxTokens (maximum output tokens) was being set equal to contextWindow. This meant the model was trying to reserve the entire context window for output, leaving no room for input tokens, causing an overflow.

Solution

  • Calculate maxTokens as 20% of the contextWindow instead of using the full context size
  • This ensures there's always sufficient room for input tokens (80% of context)
  • Follows the same pattern already used by other providers in the codebase

Changes

  • Updated parseLMStudioModel function to calculate maxTokens = Math.ceil(contextLength * 0.2)
  • Added comprehensive test coverage for the new calculation logic
  • Added clear comments explaining the rationale

Testing

  • All existing tests pass ✅
  • Added new test cases covering various context window sizes
  • Linting and type checking pass ✅

Fixes #7388


Important

Fixes LM Studio crash by setting maxTokens to 20% of contextWindow in parseLMStudioModel, ensuring space for input tokens.

  • Behavior:
    • Fixes crash in LM Studio by setting maxTokens to 20% of contextWindow in parseLMStudioModel.
    • Ensures input tokens have sufficient space (80% of context).
  • Testing:
    • Adds test cases in lmstudio.test.ts for various contextLength values to verify maxTokens calculation.
    • All existing tests pass.
  • Misc:
    • Adds comments in lmstudio.ts explaining maxTokens calculation rationale.

This description was created by Ellipsis for 8119af6. You can customize this summary. It will automatically update as commits are pushed.

- Fix context limit crash issue by properly calculating maxTokens
- Previously maxTokens was set equal to contextWindow causing overflow
- Now maxTokens is calculated as 20% of contextWindow to leave room for input
- Add test coverage for the new calculation logic

Fixes #7388
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code because apparently I trust no one, not even myself.

// Calculate maxTokens as 20% of context window to prevent context overflow
// This ensures there's always room for input tokens and prevents crashes
// when approaching the context limit
const maxOutputTokens = Math.ceil(contextLength * 0.2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we consider making this ratio configurable? While 20% is a reasonable default that matches other providers, some users might want to adjust this based on their specific use cases. Perhaps a setting like lmstudio.maxOutputRatio with a default of 0.2?

// Calculate maxTokens as 20% of context window to prevent context overflow
// This ensures there's always room for input tokens and prevents crashes
// when approaching the context limit
const maxOutputTokens = Math.ceil(contextLength * 0.2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For models with very small context windows (e.g., < 1000 tokens), this 20% calculation might result in very limited output capacity. Should we consider adding a minimum threshold? Something like:

Suggested change
const maxOutputTokens = Math.ceil(contextLength * 0.2)
// Calculate maxTokens as 20% of context window to prevent context overflow
// This ensures there's always room for input tokens and prevents crashes
// when approaching the context limit
const calculatedMaxTokens = Math.ceil(contextLength * 0.2)
// Ensure a minimum of 200 tokens for very small context windows
const maxOutputTokens = Math.max(calculatedMaxTokens, Math.min(200, contextLength))

{ contextLength: 8192, expectedMaxTokens: Math.ceil(8192 * 0.2) }, // 1639
{ contextLength: 128000, expectedMaxTokens: Math.ceil(128000 * 0.2) }, // 25600
{ contextLength: 200000, expectedMaxTokens: Math.ceil(200000 * 0.2) }, // 40000
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage! The test cases cover a good range of context sizes. Consider also adding a test case for very small context windows (e.g., 512 tokens) to ensure the calculation works correctly at the lower bounds.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when to expect this to be merged in next update?

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 25, 2025
@akierum
Copy link

akierum commented Aug 25, 2025

So when to expect this to be merged in next update?

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 26, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 26, 2025
@daniel-lxs daniel-lxs closed this Aug 27, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 27, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working PR - Needs Preliminary Review size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

The model has crashed without additional information. (Exit code: 18446744072635812000). when reach 128K context with 200K limit

5 participants