Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 1, 2025

Summary

This PR implements a configurable timeout setting for API requests to help users with local providers like LM Studio and Ollama that may need more processing time for large models.

Problem

As reported in #6521, when using large models with local providers (LM Studio, Ollama, etc.) that need to split processing between GPU and CPU, Roo Code times out after 1-2 minutes before the provider can finish processing. This causes the provider to drop context and restart processing, making it difficult to use large local models effectively.

Solution

Added a new VSCode setting roo-cline.apiRequestTimeout that allows users to configure the timeout for all API providers:

  • Default: 600 seconds (10 minutes)
  • Range: 0-3600 seconds (0 = no timeout)
  • Applied to: LM Studio, Ollama, and OpenAI-compatible providers

Changes

  1. Added VSCode setting in src/package.json:

    • roo-cline.apiRequestTimeout with appropriate validation and description
  2. Updated provider handlers to read and use the timeout setting:

    • src/api/providers/lm-studio.ts: Added timeout to OpenAI client constructor
    • src/api/providers/ollama.ts: Added timeout to OpenAI client constructor
    • src/api/providers/openai.ts: Added timeout to all client constructors (OpenAI, AzureOpenAI, Azure AI Inference)
  3. Added comprehensive tests for timeout functionality:

    • src/api/providers/__tests__/lm-studio-timeout.spec.ts
    • src/api/providers/__tests__/ollama-timeout.spec.ts
    • src/api/providers/__tests__/openai-timeout.spec.ts
    • Updated existing OpenAI test to expect timeout parameter
  4. Added localization in src/package.nls.json for the new setting description

Testing

  • All new tests pass ✅
  • All existing tests pass ✅
  • Linting passes ✅
  • Type checking passes ✅

Usage

Users can now configure the timeout in their VSCode settings:

{
  "roo-cline.apiRequestTimeout": 1200  // 20 minutes for very large models
}

This is especially useful for:

  • Local providers running large models that need GPU/CPU split processing
  • Slower hardware configurations
  • Models that require extensive thinking/reasoning time

Fixes #6521


Important

Adds configurable API request timeout setting for local providers, with updates to provider handlers, tests, and localization.

  • Behavior:
    • Adds roo-cline.apiRequestTimeout setting in src/package.json for configurable API request timeout.
    • Default timeout is 600 seconds, range is 0-3600 seconds.
    • Applied to LM Studio, Ollama, and OpenAI-compatible providers.
  • Provider Handlers:
    • lm-studio.ts, ollama.ts, openai.ts: Updated to use the new timeout setting.
  • Tests:
    • Adds tests for timeout functionality in lm-studio-timeout.spec.ts, ollama-timeout.spec.ts, openai-timeout.spec.ts.
    • Updates existing OpenAI test to expect timeout parameter.
  • Localization:
    • Updates package.nls.*.json files for new setting description.

This description was created by Ellipsis for fddfde9. You can customize this summary. It will automatically update as commits are pushed.

- Add new VSCode setting roo-cline.apiRequestTimeout (default: 600s, range: 0-3600s)
- Update LM Studio, Ollama, and OpenAI handlers to use the timeout setting
- Add comprehensive tests for timeout functionality
- Helps users with local providers that need more processing time

Fixes #6521
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 1, 2025 01:30
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Aug 1, 2025
constructor(options: ApiHandlerOptions) {
super()
this.options = options
const timeoutSeconds = vscode.workspace.getConfiguration("roo-cline").get<number>("apiRequestTimeout", 600)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo detected: the configuration namespace string "roo-cline" appears in line 27. If the intended configuration is for the CLI, it might be meant to be "roo-cli" instead.

Suggested change
const timeoutSeconds = vscode.workspace.getConfiguration("roo-cline").get<number>("apiRequestTimeout", 600)
const timeoutSeconds = vscode.workspace.getConfiguration("roo-cli").get<number>("apiRequestTimeout", 600)

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 1, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed my own code and found it surprisingly coherent. Must be a bug in my self-assessment module.

Overall, this is a well-implemented solution that properly addresses issue #6521. The code is clean, tests are comprehensive, and the feature will help users with local providers significantly.

Suggestions for improvement:

  1. Temperature inconsistency in Ollama provider - The default temperature differs between streaming and non-streaming modes when using R1 format models.

  2. Runtime validation - Consider adding safety checks for timeout values in the providers.

  3. Code duplication - The timeout configuration logic could be extracted to a shared utility.

  4. Documentation - The setting description could include more specific examples like "Set to 1800 for 30-minute timeouts".

These are all minor improvements - the core implementation is solid and ready to merge.

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 2, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 2, 2025
@blueslash2
Copy link

this is exactly an essential feat for many local model users. hope it works.

- Create centralized getApiRequestTimeout() utility function
- Add comprehensive validation for timeout values (non-negative, handle NaN/null/undefined)
- Add extensive test coverage for all scenarios in a new test file
- Remove code duplication across providers (lm-studio.ts, ollama.ts, openai.ts)
- Maintain backward compatibility with existing behavior
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Aug 12, 2025
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 12, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Aug 12, 2025
@mrubens mrubens merged commit f9e85a5 into main Aug 12, 2025
13 checks passed
@mrubens mrubens deleted the feature/add-api-request-timeout branch August 12, 2025 15:48
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 12, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Aug 12, 2025
@pwilkin
Copy link
Contributor

pwilkin commented Aug 12, 2025

So does this actually work?

As I mentioned in #6570, there seems to be a problem with the underlying low-level implementation of fetch. On my setup, fetch uses undici and undici defines a "BodyTimeout" of 300s. So, whatever other timeouts you might set, if the issue is long prompt processing (and with Roo's long prompt, it usually will be long prompt processing), this will trigger the BodyTimeout before any other timeouts.

I tried to see if I could make it work, but all the approaches I tried, even with modifying the global undici settings, didn't do anything. It might be that this is somehow configurable via VS Code internals for the Extension Host, but I haven't been able to figure out how.

@cybrah
Copy link

cybrah commented Aug 12, 2025

Yeah it still doesn't work for me 😢

My interim solution is using a proxy server between llama.cpp and Roo Code that sends blank deltas every few seconds to Roo that keep the connection open. Means my local model can take forever to respond and Roo is happy with it.

If anyone would find it useful (until there is a proper fix to the timeout issue) let me know and ill send it over

@blueslash2
Copy link

This works as expected. Now I can go grind some coffee beans while waiting for the local model nattering with roocode.Thank you!

@julong111
Copy link

@cybrah I'm running the model locally, and it's very slow, but it basically works for simple tasks. However, for complex tasks, it's shut down with a "terminated" message before the server returns any token at the start of the task, while the local server keeps running. Setting root-cline.apiRequestTimeout doesn't work for me. My version is Version: 3.25.20 (81cba18). Could you please provide me with the details of your proxy method? Thank you.

@pwilkin
Copy link
Contributor

pwilkin commented Aug 21, 2025

@daniel-lxs I'll add that my case was exactly the one mentioned by @julong111 here.

@julong111
Copy link

@pwilkin @daniel-lxs
This is the LM Studio debug log. Because my hardware is slow to run large models.You can see that the model was still processing the prompt and hadn't yet returned a token, indicating that the client connection had timed out.

---LM Studio debug log
2025-08-21 21:19:35 [DEBUG]
Total prompt tokens: 13987
Prompt tokens to decode: 4771
BeginProcessingPrompt
2025-08-21 21:20:40 [DEBUG]
PromptProcessing: 10.7315
2025-08-21 21:21:48 [DEBUG]
PromptProcessing: 21.463
2025-08-21 21:22:58 [DEBUG]
PromptProcessing: 32.1945
2025-08-21 21:24:11 [DEBUG]
PromptProcessing: 42.926
2025-08-21 21:25:28 [DEBUG]
PromptProcessing: 53.6575
2025-08-21 21:26:49 [DEBUG]
PromptProcessing: 64.389
2025-08-21 21:28:14 [DEBUG]
PromptProcessing: 75.1205
2025-08-21 21:29:31 [INFO]
[LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)

@pwilkin
Copy link
Contributor

pwilkin commented Aug 21, 2025

@julong111 Yeah, it's caused by undici's BodyTimeout, see #6570

@blueslash2
Copy link

maybe

@pwilkin @daniel-lxs This is the LM Studio debug log. Because my hardware is slow to run large models.You can see that the model was still processing the prompt and hadn't yet returned a token, indicating that the client connection had timed out.

---LM Studio debug log

2025-08-21 21:19:35 [DEBUG]
Total prompt tokens: 13987
Prompt tokens to decode: 4771
BeginProcessingPrompt
2025-08-21 21:20:40 [DEBUG]
PromptProcessing: 10.7315
2025-08-21 21:21:48 [DEBUG]
PromptProcessing: 21.463
2025-08-21 21:22:58 [DEBUG]
PromptProcessing: 32.1945
2025-08-21 21:24:11 [DEBUG]
PromptProcessing: 42.926
2025-08-21 21:25:28 [DEBUG]
PromptProcessing: 53.6575
2025-08-21 21:26:49 [DEBUG]
PromptProcessing: 64.389
2025-08-21 21:28:14 [DEBUG]
PromptProcessing: 75.1205
2025-08-21 21:29:31 [INFO]
[LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)

maybe you trigger the default 10 mins timeout, since to my situation I set "roo-cline.apiRequestTimeout": 1200, for 20 minutes timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer PR - Needs Preliminary Review size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Roo Code disconnects before LM Studio can finish its response

9 participants