Skip to content

Feature Request: Expose LLM Request Timeout as Environment Variable #244

@nickgnat

Description

@nickgnat

Summary

Add a configurable environment variable (e.g. LLM_REQUEST_TIMEOUT) to control the timeout for requests to the LLM backend.

Motivation

Speakr's current 10-minute request timeout (inherited from the OpenAI Python SDK default) works well for cloud-hosted models but is insufficient for users running local inference via Ollama or similar self-hosted backends, particularly when:

  • Using larger models (e.g. 12b+ parameter models) that are partially CPU-offloaded due to limited VRAM
  • Processing long transcripts (60–120 minute sessions can generate 8,000–25,000+ token prompts)
  • Running on consumer hardware where inference is significantly slower than cloud APIs

Currently the only workaround is to patch the source directly or use smaller models that may produce lower quality summaries.

Proposed Solution

Expose the LLM client timeout as an environment variable, for example:

LLM_REQUEST_TIMEOUT=1800  # seconds, default 600

This would be passed to the OpenAI client at initialization:

client = OpenAI(base_url=..., api_key=..., timeout=int(os.getenv('LLM_REQUEST_TIMEOUT', 600)))

Additional Context

It would also be worth considering disabling or making configurable the automatic retry behavior (openai._base_client retries on timeout) for local inference endpoints. Retrying a timed-out request against a still-processing Ollama instance queues duplicate jobs, compounding the problem rather than resolving it.

Use Case

Local Ollama deployment with gemma3:12b on an NVIDIA RTX A2000 6GB, processing 90-120 minute meeting transcripts. Inference completes successfully when tested manually but exceeds the 10-minute timeout window in Speakr.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions