Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 31 additions & 1 deletion src/api/providers/fetchers/__tests__/lmstudio.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ describe("LMStudio Fetcher", () => {
supportsPromptCache: true,
supportsImages: rawModel.vision,
supportsComputerUse: false,
maxTokens: rawModel.contextLength,
maxTokens: Math.ceil(rawModel.contextLength * 0.2), // Should be 20% of context window
inputPrice: 0,
outputPrice: 0,
cacheWritesPrice: 0,
Expand All @@ -70,6 +70,36 @@ describe("LMStudio Fetcher", () => {
const result = parseLMStudioModel(rawModel)
expect(result).toEqual(expectedModelInfo)
})

it("should calculate maxTokens as 20% of context window", () => {
const testCases = [
{ contextLength: 8192, expectedMaxTokens: Math.ceil(8192 * 0.2) }, // 1639
{ contextLength: 128000, expectedMaxTokens: Math.ceil(128000 * 0.2) }, // 25600
{ contextLength: 200000, expectedMaxTokens: Math.ceil(200000 * 0.2) }, // 40000
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage! The test cases cover a good range of context sizes. Consider also adding a test case for very small context windows (e.g., 512 tokens) to ensure the calculation works correctly at the lower bounds.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when to expect this to be merged in next update?


testCases.forEach(({ contextLength, expectedMaxTokens }) => {
const rawModel: LLMInstanceInfo = {
type: "llm",
modelKey: "test-model",
format: "safetensors",
displayName: "Test Model",
path: "test/model",
sizeBytes: 1000000,
architecture: "test",
identifier: "test/model",
instanceReference: "TEST123",
vision: false,
trainedForToolUse: false,
maxContextLength: contextLength,
contextLength: contextLength,
}

const result = parseLMStudioModel(rawModel)
expect(result.maxTokens).toBe(expectedMaxTokens)
expect(result.contextWindow).toBe(contextLength)
})
})
})

describe("getLMStudioModels", () => {
Expand Down
7 changes: 6 additions & 1 deletion src/api/providers/fetchers/lmstudio.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,18 @@ export const parseLMStudioModel = (rawModel: LLMInstanceInfo | LLMInfo): ModelIn
// Handle both LLMInstanceInfo (from loaded models) and LLMInfo (from downloaded models)
const contextLength = "contextLength" in rawModel ? rawModel.contextLength : rawModel.maxContextLength

// Calculate maxTokens as 20% of context window to prevent context overflow
// This ensures there's always room for input tokens and prevents crashes
// when approaching the context limit
const maxOutputTokens = Math.ceil(contextLength * 0.2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we consider making this ratio configurable? While 20% is a reasonable default that matches other providers, some users might want to adjust this based on their specific use cases. Perhaps a setting like lmstudio.maxOutputRatio with a default of 0.2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For models with very small context windows (e.g., < 1000 tokens), this 20% calculation might result in very limited output capacity. Should we consider adding a minimum threshold? Something like:

Suggested change
const maxOutputTokens = Math.ceil(contextLength * 0.2)
// Calculate maxTokens as 20% of context window to prevent context overflow
// This ensures there's always room for input tokens and prevents crashes
// when approaching the context limit
const calculatedMaxTokens = Math.ceil(contextLength * 0.2)
// Ensure a minimum of 200 tokens for very small context windows
const maxOutputTokens = Math.max(calculatedMaxTokens, Math.min(200, contextLength))


const modelInfo: ModelInfo = Object.assign({}, lMStudioDefaultModelInfo, {
description: `${rawModel.displayName} - ${rawModel.path}`,
contextWindow: contextLength,
supportsPromptCache: true,
supportsImages: rawModel.vision,
supportsComputerUse: false,
maxTokens: contextLength,
maxTokens: maxOutputTokens,
})

return modelInfo
Expand Down
Loading