You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to integrate Continue with a custom AI gateway (Higress) that unifies multiple local Ollama instances (and potentially other models like Claude or GPT-4) into a single OpenAI-compatible API endpoint (e.g., http://localhost:8080/v1/chat/completions). The gateway handles load balancing and dynamic model routing, so Continue should interact with a generic endpoint without specifying backend models (e.g., Qwen3-Coder, Llama 3, Mistral). For example, a request to a "chat" model via the gateway could route to any backend model based on Higress's routing logic.
Is this setup supported by Continue? Can I configure ~/.continue/config.yaml to use a single OpenAI-compatible endpoint for a generic model name (e.g., "chat"), and will Continue reliably handle chat, code completion, and other roles with requests routed through the gateway's load balancer?
Steps to Reproduce
Set up multiple Ollama instances locally (e.g., localhost:11434 for one model, localhost:11435 for another).
Deploy Higress AI Gateway to expose a unified OpenAI-compatible endpoint (e.g., http://localhost:8080/v1), routing to backend models dynamically.
Configure ~/.continue/config.yaml as follows:
name: Local Agentversion: 1.0.0schema: v1models:
- name: Chat Model via Higressprovider: openaimodel: chatapiBase: http://localhost:8080/v1apiKey: ""roles:
- chat
- edit
- apply
- name: Autocomplete Model via Higressprovider: openaimodel: autocompleteapiBase: http://localhost:8080/v1apiKey: ""roles:
- autocomplete
- name: Nomic Embedprovider: ollamamodel: nomic-embed-text:latestroles:
- embed
Test Continue's chat or code completion in VS Code, expecting requests to route through the gateway to any backend model.
Expected Behavior
Continue should send requests to the Higress endpoint (e.g., http://localhost:8080/v1) using generic model names (e.g., "chat" or "autocomplete"), with Higress handling dynamic routing to backend models (e.g., Qwen3-Coder, Llama 3, or others). Chat, edit, apply, and autocomplete features should work seamlessly, leveraging the gateway's load balancing.
Actual Behavior
I haven't tested this setup yet, as I'm unsure if Continue supports a custom OpenAI-compatible endpoint with dynamic model routing (where the client doesn't specify the backend model). I'm seeking confirmation or guidance on potential issues, such as model naming, API compatibility, or role-specific routing.
Environment
Continue Version: Latest (as of September 2025)
VS Code Version: [e.g., 1.93.x, please update]
OS: [e.g., macOS 15, Ubuntu 24.04, or Windows 11, please update]
Ollama: Latest, running multiple instances
Higress: Latest, configured as AI Gateway with OpenAI-compatible API
Additional Context
Higress uses Istio/Envoy for load balancing and dynamic routing, supporting OpenAI API specs (e.g., /v1/chat/completions).
The gateway may route to local Ollama models (e.g., Qwen3-Coder 30B, Llama 3) or external models (e.g., Claude, GPT-4), but Continue should treat it as a single endpoint with generic model names.
All requests are local for privacy, with Higress managing model selection and failover.
Are there specific config.yaml settings (e.g., custom headers, context length) or limitations I should consider for this setup?
If dynamic routing isn't fully supported, are there workarounds or plans to enhance custom provider support for such gateways?
Thank you for any guidance or examples of similar integrations!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the Problem
I want to integrate Continue with a custom AI gateway (Higress) that unifies multiple local Ollama instances (and potentially other models like Claude or GPT-4) into a single OpenAI-compatible API endpoint (e.g.,
http://localhost:8080/v1/chat/completions). The gateway handles load balancing and dynamic model routing, so Continue should interact with a generic endpoint without specifying backend models (e.g., Qwen3-Coder, Llama 3, Mistral). For example, a request to a "chat" model via the gateway could route to any backend model based on Higress's routing logic.Is this setup supported by Continue? Can I configure
~/.continue/config.yamlto use a single OpenAI-compatible endpoint for a generic model name (e.g., "chat"), and will Continue reliably handle chat, code completion, and other roles with requests routed through the gateway's load balancer?Steps to Reproduce
localhost:11434for one model,localhost:11435for another).http://localhost:8080/v1), routing to backend models dynamically.~/.continue/config.yamlas follows:Expected Behavior
Continue should send requests to the Higress endpoint (e.g.,
http://localhost:8080/v1) using generic model names (e.g., "chat" or "autocomplete"), with Higress handling dynamic routing to backend models (e.g., Qwen3-Coder, Llama 3, or others). Chat, edit, apply, and autocomplete features should work seamlessly, leveraging the gateway's load balancing.Actual Behavior
I haven't tested this setup yet, as I'm unsure if Continue supports a custom OpenAI-compatible endpoint with dynamic model routing (where the client doesn't specify the backend model). I'm seeking confirmation or guidance on potential issues, such as model naming, API compatibility, or role-specific routing.
Environment
Additional Context
/v1/chat/completions).config.yamlsettings (e.g., custom headers, context length) or limitations I should consider for this setup?Thank you for any guidance or examples of similar integrations!
Beta Was this translation helpful? Give feedback.
All reactions