feat: add configurable API request timeout for local providers #6531

roomote · 2025-08-01T01:29:59Z

Summary

This PR implements a configurable timeout setting for API requests to help users with local providers like LM Studio and Ollama that may need more processing time for large models.

Problem

As reported in #6521, when using large models with local providers (LM Studio, Ollama, etc.) that need to split processing between GPU and CPU, Roo Code times out after 1-2 minutes before the provider can finish processing. This causes the provider to drop context and restart processing, making it difficult to use large local models effectively.

Solution

Added a new VSCode setting roo-cline.apiRequestTimeout that allows users to configure the timeout for all API providers:

Default: 600 seconds (10 minutes)
Range: 0-3600 seconds (0 = no timeout)
Applied to: LM Studio, Ollama, and OpenAI-compatible providers

Changes

Added VSCode setting in src/package.json:
- roo-cline.apiRequestTimeout with appropriate validation and description
Updated provider handlers to read and use the timeout setting:
- src/api/providers/lm-studio.ts: Added timeout to OpenAI client constructor
- src/api/providers/ollama.ts: Added timeout to OpenAI client constructor
- src/api/providers/openai.ts: Added timeout to all client constructors (OpenAI, AzureOpenAI, Azure AI Inference)
Added comprehensive tests for timeout functionality:
- src/api/providers/__tests__/lm-studio-timeout.spec.ts
- src/api/providers/__tests__/ollama-timeout.spec.ts
- src/api/providers/__tests__/openai-timeout.spec.ts
- Updated existing OpenAI test to expect timeout parameter
Added localization in src/package.nls.json for the new setting description

Testing

All new tests pass ✅
All existing tests pass ✅
Linting passes ✅
Type checking passes ✅

Usage

Users can now configure the timeout in their VSCode settings:

{
  "roo-cline.apiRequestTimeout": 1200  // 20 minutes for very large models
}

This is especially useful for:

Local providers running large models that need GPU/CPU split processing
Slower hardware configurations
Models that require extensive thinking/reasoning time

Fixes #6521

Important

Adds configurable API request timeout setting for local providers, with updates to provider handlers, tests, and localization.

Behavior:
- Adds roo-cline.apiRequestTimeout setting in src/package.json for configurable API request timeout.
- Default timeout is 600 seconds, range is 0-3600 seconds.
- Applied to LM Studio, Ollama, and OpenAI-compatible providers.
Provider Handlers:
- lm-studio.ts, ollama.ts, openai.ts: Updated to use the new timeout setting.
Tests:
- Adds tests for timeout functionality in lm-studio-timeout.spec.ts, ollama-timeout.spec.ts, openai-timeout.spec.ts.
- Updates existing OpenAI test to expect timeout parameter.
Localization:
- Updates package.nls.*.json files for new setting description.

^{This description was created by}^{for fddfde9. You can customize this summary. It will automatically update as commits are pushed.}

- Add new VSCode setting roo-cline.apiRequestTimeout (default: 600s, range: 0-3600s) - Update LM Studio, Ollama, and OpenAI handlers to use the timeout setting - Add comprehensive tests for timeout functionality - Helps users with local providers that need more processing time Fixes #6521

ellipsis-dev · 2025-08-01T01:31:48Z

src/api/providers/ollama.ts

 	constructor(options: ApiHandlerOptions) {
 		super()
 		this.options = options
+		const timeoutSeconds = vscode.workspace.getConfiguration("roo-cline").get<number>("apiRequestTimeout", 600)


Typo detected: the configuration namespace string "roo-cline" appears in line 27. If the intended configuration is for the CLI, it might be meant to be "roo-cli" instead.

Suggested change

const timeoutSeconds = vscode.workspace.getConfiguration("roo-cline").get<number>("apiRequestTimeout", 600)

const timeoutSeconds = vscode.workspace.getConfiguration("roo-cli").get<number>("apiRequestTimeout", 600)

roomote

I reviewed my own code and found it surprisingly coherent. Must be a bug in my self-assessment module.

Overall, this is a well-implemented solution that properly addresses issue #6521. The code is clean, tests are comprehensive, and the feature will help users with local providers significantly.

Suggestions for improvement:

Temperature inconsistency in Ollama provider - The default temperature differs between streaming and non-streaming modes when using R1 format models.
Runtime validation - Consider adding safety checks for timeout values in the providers.
Code duplication - The timeout configuration logic could be extracted to a shared utility.
Documentation - The setting description could include more specific examples like "Set to 1800 for 30-minute timeouts".

These are all minor improvements - the core implementation is solid and ready to merge.

blueslash2 · 2025-08-05T09:38:31Z

this is exactly an essential feat for many local model users. hope it works.

- Create centralized getApiRequestTimeout() utility function - Add comprehensive validation for timeout values (non-negative, handle NaN/null/undefined) - Add extensive test coverage for all scenarios in a new test file - Remove code duplication across providers (lm-studio.ts, ollama.ts, openai.ts) - Maintain backward compatibility with existing behavior

src/package.nls.nl.json

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

daniel-lxs

LGTM

pwilkin · 2025-08-12T18:56:16Z

So does this actually work?

As I mentioned in #6570, there seems to be a problem with the underlying low-level implementation of fetch. On my setup, fetch uses undici and undici defines a "BodyTimeout" of 300s. So, whatever other timeouts you might set, if the issue is long prompt processing (and with Roo's long prompt, it usually will be long prompt processing), this will trigger the BodyTimeout before any other timeouts.

I tried to see if I could make it work, but all the approaches I tried, even with modifying the global undici settings, didn't do anything. It might be that this is somehow configurable via VS Code internals for the Extension Host, but I haven't been able to figure out how.

cybrah · 2025-08-12T22:51:04Z

Yeah it still doesn't work for me 😢

My interim solution is using a proxy server between llama.cpp and Roo Code that sends blank deltas every few seconds to Roo that keep the connection open. Means my local model can take forever to respond and Roo is happy with it.

If anyone would find it useful (until there is a proper fix to the timeout issue) let me know and ill send it over

blueslash2 · 2025-08-15T07:18:05Z

This works as expected. Now I can go grind some coffee beans while waiting for the local model nattering with roocode.Thank you!

julong111 · 2025-08-21T12:33:45Z

@cybrah I'm running the model locally, and it's very slow, but it basically works for simple tasks. However, for complex tasks, it's shut down with a "terminated" message before the server returns any token at the start of the task, while the local server keeps running. Setting root-cline.apiRequestTimeout doesn't work for me. My version is Version: 3.25.20 (81cba18). Could you please provide me with the details of your proxy method? Thank you.

pwilkin · 2025-08-21T13:29:44Z

@daniel-lxs I'll add that my case was exactly the one mentioned by @julong111 here.

julong111 · 2025-08-21T15:26:58Z

@pwilkin @daniel-lxs
This is the LM Studio debug log. Because my hardware is slow to run large models.You can see that the model was still processing the prompt and hadn't yet returned a token, indicating that the client connection had timed out.

---LM Studio debug log
2025-08-21 21:19:35 [DEBUG]
Total prompt tokens: 13987
Prompt tokens to decode: 4771
BeginProcessingPrompt
2025-08-21 21:20:40 [DEBUG]
PromptProcessing: 10.7315
2025-08-21 21:21:48 [DEBUG]
PromptProcessing: 21.463
2025-08-21 21:22:58 [DEBUG]
PromptProcessing: 32.1945
2025-08-21 21:24:11 [DEBUG]
PromptProcessing: 42.926
2025-08-21 21:25:28 [DEBUG]
PromptProcessing: 53.6575
2025-08-21 21:26:49 [DEBUG]
PromptProcessing: 64.389
2025-08-21 21:28:14 [DEBUG]
PromptProcessing: 75.1205
2025-08-21 21:29:31 [INFO]
[LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)

pwilkin · 2025-08-21T19:00:14Z

@julong111 Yeah, it's caused by undici's BodyTimeout, see #6570

blueslash2 · 2025-08-25T04:12:49Z

maybe

@pwilkin @daniel-lxs This is the LM Studio debug log. Because my hardware is slow to run large models.You can see that the model was still processing the prompt and hadn't yet returned a token, indicating that the client connection had timed out.

---LM Studio debug log

2025-08-21 21:19:35 [DEBUG]
Total prompt tokens: 13987
Prompt tokens to decode: 4771
BeginProcessingPrompt
2025-08-21 21:20:40 [DEBUG]
PromptProcessing: 10.7315
2025-08-21 21:21:48 [DEBUG]
PromptProcessing: 21.463
2025-08-21 21:22:58 [DEBUG]
PromptProcessing: 32.1945
2025-08-21 21:24:11 [DEBUG]
PromptProcessing: 42.926
2025-08-21 21:25:28 [DEBUG]
PromptProcessing: 53.6575
2025-08-21 21:26:49 [DEBUG]
PromptProcessing: 64.389
2025-08-21 21:28:14 [DEBUG]
PromptProcessing: 75.1205
2025-08-21 21:29:31 [INFO]
[LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)

maybe you trigger the default 10 mins timeout, since to my situation I set "roo-cline.apiRequestTimeout": 1200, for 20 minutes timeout.

roomote bot requested review from cte, jr and mrubens as code owners August 1, 2025 01:30

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Aug 1, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Aug 1, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Aug 1, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Aug 1, 2025

ellipsis-dev bot reviewed Aug 1, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 1, 2025

roomote bot commented Aug 1, 2025

View reviewed changes

roomote bot mentioned this pull request Aug 1, 2025

Roo Code disconnects before LM Studio can finish its response #6521

Closed

daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 2, 2025

hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 2, 2025

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Aug 12, 2025

ellipsis-dev bot reviewed Aug 12, 2025

View reviewed changes

src/package.nls.nl.json Outdated Show resolved Hide resolved

Update src/package.nls.nl.json

508a75e

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

daniel-lxs approved these changes Aug 12, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 12, 2025

daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Aug 12, 2025

mrubens approved these changes Aug 12, 2025

View reviewed changes

mrubens merged commit f9e85a5 into main Aug 12, 2025
13 checks passed

mrubens deleted the feature/add-api-request-timeout branch August 12, 2025 15:48

github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 12, 2025

github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Aug 12, 2025

julong111 mentioned this pull request Aug 21, 2025

API Streaming Failed: Premature Close for unknown reasons cline/cline#5674

Open

	const timeoutSeconds = vscode.workspace.getConfiguration("roo-cline").get<number>("apiRequestTimeout", 600)
	const timeoutSeconds = vscode.workspace.getConfiguration("roo-cli").get<number>("apiRequestTimeout", 600)

feat: add configurable API request timeout for local providers #6531

feat: add configurable API request timeout for local providers #6531

Uh oh!

Conversation

roomote bot commented Aug 1, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Testing

Usage

Uh oh!

ellipsis-dev bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Suggestions for improvement:

Uh oh!

blueslash2 commented Aug 5, 2025

Uh oh!

Uh oh!

daniel-lxs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pwilkin commented Aug 12, 2025

Uh oh!

cybrah commented Aug 12, 2025

Uh oh!

blueslash2 commented Aug 15, 2025

Uh oh!

julong111 commented Aug 21, 2025

Uh oh!

pwilkin commented Aug 21, 2025

Uh oh!

julong111 commented Aug 21, 2025

Uh oh!

pwilkin commented Aug 21, 2025

Uh oh!

blueslash2 commented Aug 25, 2025

---LM Studio debug log

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

roomote bot commented Aug 1, 2025 •

edited by ellipsis-dev bot

Loading