Skip to content

Commit bdb66d7

Browse files
committed
readme refresher
1 parent 7b7c96e commit bdb66d7

File tree

1 file changed

+31
-3
lines changed

1 file changed

+31
-3
lines changed

readme.md

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,13 @@
3030
3131
Olla is a high-performance, low-overhead, low-latency proxy and load balancer for managing LLM infrastructure. It intelligently routes LLM requests across local and remote inference nodes - including [Ollama](https://github.com/ollama/ollama), [LM Studio](https://lmstudio.ai/) and OpenAI-compatible endpoints like [vLLM](https://github.com/vllm-project/vllm). Olla provides model discovery and unified model catalogues within each provider, enabling seamless routing to available models on compatible endpoints.
3232

33-
You can choose between two proxy engines: **Sherpa** for simplicity and maintainability or **Olla** for maximum performance with advanced features like circuit breakers and connection pooling.
33+
Unlike API gateways like [LiteLLM](https://github.com/BerriAI/litellm) or orchestration platforms like [GPUStack](https://github.com/gpustack/gpustack), Olla focuses on making your **existing** LLM infrastructure reliable through intelligent routing and failover. You can choose between two proxy engines: **Sherpa** for simplicity and maintainability or **Olla** for maximum performance with advanced features like circuit breakers and connection pooling.
3434

3535
Single CLI application and config file is all you need to go Olla!
3636

3737
![Olla Usecase](assets/diagrams/usecases.excalidraw.png)
3838

39-
In the above example, we configure [Jetbrains Junie](https://www.jetbrains.com/junie/) to use Olla for its Ollama and LMStudio endpoints for local-ai inference with Junie.
39+
In the above example, we configure [Jetbrains Junie](https://www.jetbrains.com/junie/) to use Olla for its Ollama and LMStudio endpoints for local-ai inference with Junie (see [how to configure Jetbrains Junie](https://thushan.github.io/olla/usage/#development-tools-junie)).
4040

4141
## Key Features
4242

@@ -52,6 +52,18 @@ In the above example, we configure [Jetbrains Junie](https://www.jetbrains.com/j
5252
- **🎯 LLM-Optimised**: Streaming-first design with optimised timeouts for long inference
5353
- **⚙️ High Performance**: Designed to be very [lightweight & efficient](https://thushan.github.io/olla/configuration/practices/performance/), runs on less than 50Mb RAM.
5454

55+
## How Olla Fits in Your Stack
56+
57+
| Tool | Purpose | Use Together? |
58+
|------|---------|--------------|
59+
| **Olla** | Load balancing & failover for existing endpoints | - |
60+
| **[LiteLLM](https://github.com/BerriAI/litellm)** | API translation for cloud providers | ✅ Use for cloud APIs |
61+
| **[GPUStack](https://github.com/gpustack/gpustack)** | GPU cluster orchestration | ✅ Route to managed endpoints |
62+
| **[LocalAI](https://github.com/mudler/LocalAI)** | OpenAI-compatible local API | ✅ Load balance multiple instances |
63+
| **[Ollama](https://github.com/ollama/ollama)** | Local model serving | ✅ Primary use case |
64+
65+
See our [detailed comparisons](https://thushan.github.io/olla/compare/overview/) and [integration patterns](https://thushan.github.io/olla/compare/integration-patterns/) for more.
66+
5567
### Supported Backends
5668

5769
Olla natively supports the following backend providers. Learn more about [Olla Integrations](https://thushan.github.io/olla/integrations/overview/).
@@ -156,6 +168,15 @@ Complete setup with [OpenWebUI](https://github.com/open-webui/open-webui) + Olla
156168

157169
You can learn more about [OpenWebUI Ollama with Olla](https://thushan.github.io/olla/integrations/frontend/openwebui/).
158170

171+
### Common Architectures
172+
173+
- **Home Lab**: Olla → Multiple Ollama instances across your machines
174+
- **Hybrid Cloud**: Olla → Local endpoints + LiteLLM → Cloud APIs
175+
- **Enterprise**: Olla → GPUStack cluster + vLLM servers + Cloud overflow
176+
- **Development**: Olla → Local + Shared team endpoints
177+
178+
See [integration patterns](https://thushan.github.io/olla/compare/integration-patterns/) for detailed architectures.
179+
159180
More examples coming soon:
160181
- **Multi-Provider Setup**: Ollama + LM Studio + OpenAI-compatible endpoints
161182
- **High-Availability**: Production deployment with failover
@@ -429,6 +450,7 @@ Full documentation is available at **[https://thushan.github.io/olla/](https://t
429450
- **[Configuration Reference](https://thushan.github.io/olla/configuration/reference/)** - Complete configuration options
430451
- **[API Reference](https://thushan.github.io/olla/api-reference/overview/)** - Full API documentation
431452
- **[Concepts](https://thushan.github.io/olla/concepts/overview/)** - Core concepts and architecture
453+
- **[Comparisons](https://thushan.github.io/olla/compare/overview/)** - Compare with LiteLLM, GPUStack, LocalAI
432454
- **[Integrations](https://thushan.github.io/olla/integrations/overview/)** - Frontend and backend integrations
433455
- **[Development](https://thushan.github.io/olla/development/overview/)** - Contributing and development guide
434456

@@ -456,8 +478,14 @@ The built-in security features are optimised for this deployment pattern:
456478
**Q: Why use Olla instead of nginx or HAProxy?** \
457479
A: Olla understands LLM-specific patterns like model routing, streaming responses, and health semantics. It also provides built-in model discovery and LLM-optimised timeouts.
458480

481+
**Q: How does Olla compare to LiteLLM?** \
482+
A: [LiteLLM](https://github.com/BerriAI/litellm) is an API translation layer for cloud providers (OpenAI, Anthropic, etc.), while Olla is an infrastructure proxy for self-hosted endpoints. They work great together - use LiteLLM for cloud APIs and Olla for local infrastructure reliability. See our [detailed comparison](https://thushan.github.io/olla/compare/litellm/).
483+
484+
**Q: Can Olla manage GPU clusters like GPUStack?** \
485+
A: No, Olla doesn't deploy or orchestrate models. For GPU cluster management, use [GPUStack](https://github.com/gpustack/gpustack). Olla can then provide routing and failover for your GPUStack-managed endpoints. See our [comparison guide](https://thushan.github.io/olla/compare/gpustack/).
486+
459487
**Q: Can I use Olla with other LLM providers?** \
460-
A: Yes! Any OpenAI-compatible API works. Configure them as `type: "openai-compatible"` endpoints (such as vLLM, LocalAI, Together AI, etc.).
488+
A: Yes! Any OpenAI-compatible API works. Configure them as `type: "openai-compatible"` endpoints (such as LiteLLM, [LocalAI](https://github.com/mudler/LocalAI), Together AI, etc.). See [integration patterns](https://thushan.github.io/olla/compare/integration-patterns/).
461489

462490
**Q: Does Olla support authentication?** \
463491
A: Olla focuses on load balancing and lets your reverse proxy handle authentication. This follows the Unix philosophy of doing one thing well.

0 commit comments

Comments
 (0)