Skip to content

Commit e017d1b

Browse files
committed
fix logo and update readme
1 parent c456b76 commit e017d1b

File tree

2 files changed

+158
-49
lines changed

2 files changed

+158
-49
lines changed

README.md

Lines changed: 158 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,86 @@
1-
# optillm
1+
# OptiLLM
22

3-
optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs. The current focus is on implementing techniques that improve reasoning over coding, logical and mathematical queries.
3+
<p align="center">
4+
<img src="optillm-logo.png" alt="OptiLLM Logo" width="200" />
5+
</p>
6+
7+
<p align="center">
8+
<strong>🚀 2-10x accuracy improvements on reasoning tasks with zero training</strong>
9+
</p>
10+
11+
<p align="center">
12+
<a href="https://github.com/codelion/optillm/stargazers"><img src="https://img.shields.io/github/stars/codelion/optillm?style=social" alt="GitHub stars"></a>
13+
<a href="https://pypi.org/project/optillm/"><img src="https://img.shields.io/pypi/v/optillm" alt="PyPI version"></a>
14+
<a href="https://pypi.org/project/optillm/"><img src="https://img.shields.io/pypi/dm/optillm" alt="PyPI downloads"></a>
15+
<a href="https://github.com/codelion/optillm/blob/main/LICENSE"><img src="https://img.shields.io/github/license/codelion/optillm" alt="License"></a>
16+
</p>
17+
18+
<p align="center">
19+
<a href="https://huggingface.co/spaces/codelion/optillm">🤗 HuggingFace Space</a> •
20+
<a href="https://colab.research.google.com/drive/1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing">📓 Colab Demo</a> •
21+
<a href="https://github.com/codelion/optillm/discussions">💬 Discussions</a>
22+
</p>
23+
24+
---
25+
26+
**OptiLLM** is an OpenAI API-compatible optimizing inference proxy that implements 20+ state-of-the-art techniques to dramatically improve LLM accuracy and performance on reasoning tasks - without requiring any model training or fine-tuning.
427

528
It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the [CePO approach](optillm/cepo) from Cerebras.
629

7-
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/codelion/optillm)
8-
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing)
9-
[![GitHub Discussions](https://img.shields.io/github/discussions/codelion/optillm)](https://github.com/codelion/optillm/discussions)
30+
## ✨ Key Features
31+
32+
- **🎯 Instant Improvements**: 2-10x better accuracy on math, coding, and logical reasoning
33+
- **🔌 Drop-in Replacement**: Works with any OpenAI-compatible API endpoint
34+
- **🧠 20+ Optimization Techniques**: From simple best-of-N to advanced MCTS and planning
35+
- **📦 Zero Training Required**: Just proxy your existing API calls through OptiLLM
36+
- **⚡ Production Ready**: Used in production by companies and researchers worldwide
37+
- **🌍 Multi-Provider**: Supports OpenAI, Anthropic, Google, Cerebras, and 100+ models via LiteLLM
38+
39+
## 🚀 Quick Start
40+
41+
Get powerful reasoning improvements in 3 simple steps:
42+
43+
```bash
44+
# 1. Install OptiLLM
45+
pip install optillm
46+
47+
# 2. Start the server
48+
export OPENAI_API_KEY="your-key-here"
49+
optillm
50+
51+
# 3. Use with any OpenAI client - just change the model name!
52+
```
53+
54+
```python
55+
from openai import OpenAI
56+
57+
client = OpenAI(base_url="http://localhost:8000/v1")
58+
59+
# Add 'moa-' prefix for Mixture of Agents optimization
60+
response = client.chat.completions.create(
61+
model="moa-gpt-4o-mini", # This gives you GPT-4o performance from GPT-4o-mini!
62+
messages=[{"role": "user", "content": "Solve: If 2x + 3 = 7, what is x?"}]
63+
)
64+
```
65+
66+
**Before OptiLLM**: "x = 2" ❌
67+
**After OptiLLM**: "Let me work through this step by step: 2x + 3 = 7, so 2x = 4, therefore x = 2" ✅
1068

11-
## Installation
69+
## 📊 Proven Results
70+
71+
OptiLLM delivers measurable improvements across diverse benchmarks:
72+
73+
| Technique | Base Model | Improvement | Benchmark |
74+
|-----------|------------|-------------|-----------|
75+
| **CePO** | Llama 3.3 70B | **+18.6 points** | Math-L5 (51.0→69.6) |
76+
| **AutoThink** | DeepSeek-R1-1.5B | **+9.34 points** | GPQA-Diamond (21.72→31.06) |
77+
| **LongCePO** | Llama 3.3 70B | **+13.6 points** | InfiniteBench (58.0→71.6) |
78+
| **MOA** | GPT-4o-mini | **Matches GPT-4** | Arena-Hard-Auto |
79+
| **PlanSearch** | GPT-4o-mini | **+20% pass@5** | LiveCodeBench |
80+
81+
*Full benchmark results below* ⬇️
82+
83+
## 🏗️ Installation
1284

1385
### Using pip
1486

@@ -48,6 +120,48 @@ source .venv/bin/activate
48120
pip install -r requirements.txt
49121
```
50122

123+
## Implemented techniques
124+
125+
| Approach | Slug | Description |
126+
| ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |
127+
| [Cerebras Planning and Optimization](optillm/cepo) | `cepo` | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
128+
| CoT with Reflection | `cot_reflection` | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output> sections |
129+
| PlanSearch | `plansearch` | Implements a search algorithm over candidate plans for solving a problem in natural language |
130+
| ReRead | `re2` | Implements rereading to improve reasoning by processing queries twice |
131+
| Self-Consistency | `self_consistency` | Implements an advanced self-consistency method |
132+
| Z3 Solver | `z3` | Utilizes the Z3 theorem prover for logical reasoning |
133+
| R* Algorithm | `rstar` | Implements the R* algorithm for problem-solving |
134+
| LEAP | `leap` | Learns task-specific principles from few shot examples |
135+
| Round Trip Optimization | `rto` | Optimizes responses through a round-trip process |
136+
| Best of N Sampling | `bon` | Generates multiple responses and selects the best one |
137+
| Mixture of Agents | `moa` | Combines responses from multiple critiques |
138+
| Monte Carlo Tree Search | `mcts` | Uses MCTS for decision-making in chat responses |
139+
| PV Game | `pvg` | Applies a prover-verifier game approach at inference time |
140+
| CoT Decoding | N/A for proxy | Implements chain-of-thought decoding to elicit reasoning without explicit prompting |
141+
| Entropy Decoding | N/A for proxy | Implements adaptive sampling based on the uncertainty of tokens during generation |
142+
| Thinkdeeper | N/A for proxy | Implements the `reasoning_effort` param from OpenAI for reasoning models like DeepSeek R1 |
143+
| [AutoThink](optillm/autothink) | N/A for proxy | Combines query complexity classification with steering vectors to enhance reasoning |
144+
145+
## Implemented plugins
146+
147+
| Plugin | Slug | Description |
148+
| ----------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
149+
| [System Prompt Learning](optillm/plugins/spl) | `spl` | Implements what [Andrej Karpathy called the third paradigm](https://x.com/karpathy/status/1921368644069765486) for LLM learning, this enables the model to acquire program solving knowledge and strategies |
150+
| [Deep Think](optillm/plugins/deepthink) | `deepthink` | Implements a Gemini-like Deep Think approach using inference time scaling for reasoning LLMs |
151+
| [Long-Context Cerebras Planning and Optimization](optillm/plugins/longcepo) | `longcepo` | Combines planning and divide-and-conquer processing of long documents to enable infinite context |
152+
| Majority Voting | `majority_voting` | Generates k candidate solutions and selects the most frequent answer through majority voting (default k=6) |
153+
| MCP Client | `mcp` | Implements the model context protocol (MCP) client, enabling you to use any LLM with any MCP Server |
154+
| Router | `router` | Uses the [optillm-modernbert-large](https://huggingface.co/codelion/optillm-modernbert-large) model to route requests to different approaches based on the user prompt |
155+
| Chain-of-Code | `coc` | Implements a chain of code approach that combines CoT with code execution and LLM based code simulation |
156+
| Memory | `memory` | Implements a short term memory layer, enables you to use unbounded context length with any LLM |
157+
| Privacy | `privacy` | Anonymize PII data in request and deanonymize it back to original value in response |
158+
| Read URLs | `readurls` | Reads all URLs found in the request, fetches the content at the URL and adds it to the context |
159+
| Execute Code | `executecode` | Enables use of code interpreter to execute python code in requests and LLM generated responses |
160+
| JSON | `json` | Enables structured outputs using the outlines library, supports pydantic types and JSON schema |
161+
| GenSelect | `genselect` | Generative Solution Selection - generates multiple candidates and selects the best based on quality criteria |
162+
| Web Search | `web_search` | Performs Google searches using Chrome automation (Selenium) to gather search results and URLs |
163+
| [Deep Research](optillm/plugins/deep_research) | `deep_research` | Implements Test-Time Diffusion Deep Researcher (TTD-DR) for comprehensive research reports using iterative refinement |
164+
51165
We support all major LLM providers and models for inference. You need to set the correct environment variable and the proxy will pick the corresponding client.
52166

53167
| Provider | Required Environment Variables | Additional Notes |
@@ -339,48 +453,6 @@ Check this log file for connection issues, tool execution errors, and other diag
339453

340454
4. **Access denied**: For filesystem operations, ensure the paths specified in the configuration are accessible to the process.
341455

342-
## Implemented techniques
343-
344-
| Approach | Slug | Description |
345-
| ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |
346-
| [Cerebras Planning and Optimization](optillm/cepo) | `cepo` | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
347-
| CoT with Reflection | `cot_reflection` | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output\> sections |
348-
| PlanSearch | `plansearch` | Implements a search algorithm over candidate plans for solving a problem in natural language |
349-
| ReRead | `re2` | Implements rereading to improve reasoning by processing queries twice |
350-
| Self-Consistency | `self_consistency` | Implements an advanced self-consistency method |
351-
| Z3 Solver | `z3` | Utilizes the Z3 theorem prover for logical reasoning |
352-
| R* Algorithm | `rstar` | Implements the R* algorithm for problem-solving |
353-
| LEAP | `leap` | Learns task-specific principles from few shot examples |
354-
| Round Trip Optimization | `rto` | Optimizes responses through a round-trip process |
355-
| Best of N Sampling | `bon` | Generates multiple responses and selects the best one |
356-
| Mixture of Agents | `moa` | Combines responses from multiple critiques |
357-
| Monte Carlo Tree Search | `mcts` | Uses MCTS for decision-making in chat responses |
358-
| PV Game | `pvg` | Applies a prover-verifier game approach at inference time |
359-
| CoT Decoding | N/A for proxy | Implements chain-of-thought decoding to elicit reasoning without explicit prompting |
360-
| Entropy Decoding | N/A for proxy | Implements adaptive sampling based on the uncertainty of tokens during generation |
361-
| Thinkdeeper | N/A for proxy | Implements the `reasoning_effort` param from OpenAI for reasoning models like DeepSeek R1 |
362-
| [AutoThink](optillm/autothink) | N/A for proxy | Combines query complexity classification with steering vectors to enhance reasoning |
363-
364-
## Implemented plugins
365-
366-
| Plugin | Slug | Description |
367-
| ----------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
368-
| [System Prompt Learning](optillm/plugins/spl) | `spl` | Implements what [Andrej Karpathy called the third paradigm](https://x.com/karpathy/status/1921368644069765486) for LLM learning, this enables the model to acquire program solving knowledge and strategies |
369-
| [Deep Think](optillm/plugins/deepthink) | `deepthink` | Implements a Gemini-like Deep Think approach using inference time scaling for reasoning LLMs |
370-
| [Long-Context Cerebras Planning and Optimization](optillm/plugins/longcepo) | `longcepo` | Combines planning and divide-and-conquer processing of long documents to enable infinite context |
371-
| Majority Voting | `majority_voting` | Generates k candidate solutions and selects the most frequent answer through majority voting (default k=6) |
372-
| MCP Client | `mcp` | Implements the model context protocol (MCP) client, enabling you to use any LLM with any MCP Server |
373-
| Router | `router` | Uses the [optillm-modernbert-large](https://huggingface.co/codelion/optillm-modernbert-large) model to route requests to different approaches based on the user prompt |
374-
| Chain-of-Code | `coc` | Implements a chain of code approach that combines CoT with code execution and LLM based code simulation |
375-
| Memory | `memory` | Implements a short term memory layer, enables you to use unbounded context length with any LLM |
376-
| Privacy | `privacy` | Anonymize PII data in request and deanonymize it back to original value in response |
377-
| Read URLs | `readurls` | Reads all URLs found in the request, fetches the content at the URL and adds it to the context |
378-
| Execute Code | `executecode` | Enables use of code interpreter to execute python code in requests and LLM generated responses |
379-
| JSON | `json` | Enables structured outputs using the outlines library, supports pydantic types and JSON schema |
380-
| GenSelect | `genselect` | Generative Solution Selection - generates multiple candidates and selects the best based on quality criteria |
381-
| Web Search | `web_search` | Performs Google searches using Chrome automation (Selenium) to gather search results and URLs |
382-
| [Deep Research](optillm/plugins/deep_research) | `deep_research` | Implements Test-Time Diffusion Deep Researcher (TTD-DR) for comprehensive research reports using iterative refinement |
383-
384456
## Available parameters
385457

386458
optillm supports various command-line arguments for configuration. When using Docker, these can also be set as environment variables prefixed with `OPTILLM_`.
@@ -607,6 +679,33 @@ All tests are automatically run on pull requests via GitHub Actions. The workflo
607679

608680
See `tests/README.md` for more details on the test structure and how to write new tests.
609681

682+
## 🤝 Contributing
683+
684+
We ❤️ contributions! OptiLLM is built by the community, for the community.
685+
686+
- 🐛 **Found a bug?** [Open an issue](https://github.com/codelion/optillm/issues/new)
687+
- 💡 **Have an idea?** [Start a discussion](https://github.com/codelion/optillm/discussions)
688+
- 🔧 **Want to code?** Check out [good first issues](https://github.com/codelion/optillm/labels/good%20first%20issue)
689+
690+
### Development Setup
691+
```bash
692+
git clone https://github.com/codelion/optillm.git
693+
cd optillm
694+
python -m venv .venv
695+
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
696+
pip install -r requirements.txt
697+
pip install -r tests/requirements.txt
698+
699+
# Run tests
700+
python -m pytest tests/
701+
```
702+
703+
## 🌟 Community & Support
704+
705+
- **🚀 Companies using OptiLLM**: [Cerebras](https://cerebras.ai), [Patched](https://patched.codes), and [50+ others](https://github.com/codelion/optillm/discussions/categories/show-and-tell)
706+
- **💬 Community**: Join our [GitHub Discussions](https://github.com/codelion/optillm/discussions)
707+
- **📧 Enterprise**: For enterprise support, contact [[email protected]](mailto:[email protected])
708+
610709
## References
611710
- [Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques](https://arxiv.org/abs/2506.08060)
612711
- [AutoThink: efficient inference for reasoning LLMs](https://dx.doi.org/10.2139/ssrn.5253327) - [Implementation](optillm/autothink)
@@ -639,10 +738,20 @@ If you use this library in your research, please cite:
639738

640739
```bibtex
641740
@software{optillm,
642-
title = {Optillm: Optimizing inference proxy for LLMs},
741+
title = {OptiLLM: Optimizing inference proxy for LLMs},
643742
author = {Asankhaya Sharma},
644743
year = {2024},
645744
publisher = {GitHub},
646745
url = {https://github.com/codelion/optillm}
647746
}
648747
```
748+
749+
---
750+
751+
<p align="center">
752+
<strong>Ready to optimize your LLMs? Install OptiLLM and see the difference! 🚀</strong>
753+
</p>
754+
755+
<p align="center">
756+
⭐ <a href="https://github.com/codelion/optillm">Star us on GitHub</a> if you find OptiLLM useful!
757+
</p>

optillm-logo.png

77 KB
Loading

0 commit comments

Comments
 (0)