Support for Custom AI Gateway with Dynamic Model Routing for Ollama #7798

sherkevin · 2025-09-17T02:52:50Z

sherkevin
Sep 17, 2025

Describe the Problem

I want to integrate Continue with a custom AI gateway (Higress) that unifies multiple local Ollama instances (and potentially other models like Claude or GPT-4) into a single OpenAI-compatible API endpoint (e.g., http://localhost:8080/v1/chat/completions). The gateway handles load balancing and dynamic model routing, so Continue should interact with a generic endpoint without specifying backend models (e.g., Qwen3-Coder, Llama 3, Mistral). For example, a request to a "chat" model via the gateway could route to any backend model based on Higress's routing logic.

Is this setup supported by Continue? Can I configure ~/.continue/config.yaml to use a single OpenAI-compatible endpoint for a generic model name (e.g., "chat"), and will Continue reliably handle chat, code completion, and other roles with requests routed through the gateway's load balancer?

Steps to Reproduce

Set up multiple Ollama instances locally (e.g., localhost:11434 for one model, localhost:11435 for another).
Deploy Higress AI Gateway to expose a unified OpenAI-compatible endpoint (e.g., http://localhost:8080/v1), routing to backend models dynamically.
Configure ~/.continue/config.yaml as follows:

name: Local Agent
version: 1.0.0
schema: v1
models:
  - name: Chat Model via Higress
    provider: openai
    model: chat
    apiBase: http://localhost:8080/v1
    apiKey: ""
    roles:
      - chat
      - edit
      - apply
  - name: Autocomplete Model via Higress
    provider: openai
    model: autocomplete
    apiBase: http://localhost:8080/v1
    apiKey: ""
    roles:
      - autocomplete
  - name: Nomic Embed
    provider: ollama
    model: nomic-embed-text:latest
    roles:
      - embed

Test Continue's chat or code completion in VS Code, expecting requests to route through the gateway to any backend model.

Expected Behavior

Continue should send requests to the Higress endpoint (e.g., http://localhost:8080/v1) using generic model names (e.g., "chat" or "autocomplete"), with Higress handling dynamic routing to backend models (e.g., Qwen3-Coder, Llama 3, or others). Chat, edit, apply, and autocomplete features should work seamlessly, leveraging the gateway's load balancing.

Actual Behavior

I haven't tested this setup yet, as I'm unsure if Continue supports a custom OpenAI-compatible endpoint with dynamic model routing (where the client doesn't specify the backend model). I'm seeking confirmation or guidance on potential issues, such as model naming, API compatibility, or role-specific routing.

Environment

Continue Version: Latest (as of September 2025)
VS Code Version: [e.g., 1.93.x, please update]
OS: [e.g., macOS 15, Ubuntu 24.04, or Windows 11, please update]
Ollama: Latest, running multiple instances
Higress: Latest, configured as AI Gateway with OpenAI-compatible API

Additional Context

Higress uses Istio/Envoy for load balancing and dynamic routing, supporting OpenAI API specs (e.g., /v1/chat/completions).
The gateway may route to local Ollama models (e.g., Qwen3-Coder 30B, Llama 3) or external models (e.g., Claude, GPT-4), but Continue should treat it as a single endpoint with generic model names.
All requests are local for privacy, with Higress managing model selection and failover.
Are there specific config.yaml settings (e.g., custom headers, context length) or limitations I should consider for this setup?
If dynamic routing isn't fully supported, are there workarounds or plans to enhance custom provider support for such gateways?

Thank you for any guidance or examples of similar integrations!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Custom AI Gateway with Dynamic Model Routing for Ollama #7798

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Support for Custom AI Gateway with Dynamic Model Routing for Ollama #7798

Uh oh!

sherkevin Sep 17, 2025

Describe the Problem

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Replies: 0 comments

sherkevin
Sep 17, 2025