Skip to content

Commit 6663c8f

Browse files
authored
Merge pull request #153 from codelion/feat-new-release-cepo
The CePO approach
2 parents 63b25fb + d7b4591 commit 6663c8f

File tree

6 files changed

+114
-658
lines changed

6 files changed

+114
-658
lines changed

README.md

Lines changed: 31 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# optillm
22

3-
optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs. The current focus is on implementing techniques that improve reasoning over coding, logical and mathematical queries. It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time.
3+
optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs. The current focus is on implementing techniques that improve reasoning over coding, logical and mathematical queries.
4+
5+
It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the [CePO approach](optillm/cepo) from Cerebras.
46

57
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/codelion/optillm)
68
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing)
@@ -46,9 +48,16 @@ source .venv/bin/activate
4648
pip install -r requirements.txt
4749
```
4850

49-
Set up the `OPENAI_API_KEY` environment variable (for OpenAI)
50-
or the `AZURE_OPENAI_API_KEY`, `AZURE_API_VERSION` and `AZURE_API_BASE` environment variables (for Azure OpenAI)
51-
or the `AZURE_API_VERSION` and `AZURE_API_BASE` environment variables and login using `az login` for Azure OpenAI with managed identity (see [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity)).
51+
We support all major LLM providers and models for inference. You need to set the correct environment variable and the proxy will pick the corresponding client.
52+
53+
| Provider | Required Environment Variables | Additional Notes |
54+
|----------|-------------------------------|------------------|
55+
| OptiLLM | `OPTILLM_API_KEY` | Uses the inbuilt local server for inference, supports logprobs and decoding techniques like `cot_decoding` & `entropy_decoding` |
56+
| OpenAI | `OPENAI_API_KEY` | You can use this with any OpenAI compatible endpoint (e.g. OpenRouter) by setting the `base_url` |
57+
| Cerebras | `CEREBRAS_API_KEY` | You can use this for fast inference with supported models, see [docs for details](https://inference-docs.cerebras.ai/introduction) |
58+
| Azure OpenAI | `AZURE_OPENAI_API_KEY`<br>`AZURE_API_VERSION`<br>`AZURE_API_BASE` | - |
59+
| Azure OpenAI (Managed Identity) | `AZURE_API_VERSION`<br>`AZURE_API_BASE` | Login required using `az login`, see [docs for details](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity) |
60+
| LiteLLM | depends on the model | See [docs for details](https://docs.litellm.ai/docs/providers) |
5261

5362
You can then run the optillm proxy as follows.
5463

@@ -325,7 +334,7 @@ Authorization: Bearer your_secret_api_key
325334

326335
## SOTA results on benchmarks with optillm
327336

328-
### CePO on math and code benchmarks
337+
### CePO on math and code benchmarks (Jan 2025)
329338

330339
| Method | Math-L5 | MMLU-Pro (Math) | GPQA | CRUX | LiveCodeBench (pass@1) | Simple QA |
331340
| -------------------------: | :-----: | :-------------: | :--: | :--: | :--------------------: | :-------: |
@@ -380,23 +389,23 @@ called patchflows. We saw huge performance gains across all the supported patchf
380389
![Results showing optillm mixture of agents approach used with patchflows](https://raw.githubusercontent.com/codelion/optillm/main/moa-patchwork-results.png)
381390

382391
## References
383-
384-
- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](https://github.com/codelion/optillm/blob/main/optillm/plugins/coc_plugin.py)
385-
- [Entropy Based Sampling and Parallel CoT Decoding](https://github.com/xjdr-alt/entropix) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/entropy_decoding.py)
386-
- [Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation](https://arxiv.org/abs/2409.12941) - [Evaluation script](https://github.com/codelion/optillm/blob/main/scripts/eval_frames_benchmark.py)
387-
- [Writing in the Margins: Better Inference Pattern for Long Context Retrieval](https://www.arxiv.org/abs/2408.14906) - [Inspired the implementation of the memory plugin](https://github.com/codelion/optillm/blob/main/optillm/plugins/memory_plugin.py)
388-
- [Chain-of-Thought Reasoning Without Prompting](https://arxiv.org/abs/2402.10200) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/cot_decoding.py)
389-
- [Re-Reading Improves Reasoning in Large Language Models](https://arxiv.org/abs/2309.06275) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/reread.py)
390-
- [In-Context Principle Learning from Mistakes](https://arxiv.org/abs/2402.05403) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/leap.py)
391-
- [Planning In Natural Language Improves LLM Search For Code Generation](https://arxiv.org/abs/2409.03733) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/plansearch.py)
392-
- [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/self_consistency.py)
393-
- [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://arxiv.org/abs/2408.06195) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/rstar.py)
394-
- [Mixture-of-Agents Enhances Large Language Model Capabilities](https://arxiv.org/abs/2406.04692) - [Inspired the implementation of moa](https://github.com/codelion/optillm/blob/main/optillm/moa.py)
395-
- [Prover-Verifier Games improve legibility of LLM outputs](https://arxiv.org/abs/2407.13692) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/pvg.py)
396-
- [Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning](https://arxiv.org/abs/2405.00451) - [Inspired the implementation of mcts](https://github.com/codelion/optillm/blob/main/optillm/mcts.py)
397-
- [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https://arxiv.org/abs/2402.08699) - [Inspired the implementation of rto](https://github.com/codelion/optillm/blob/main/optillm/rto.py)
398-
- [Patched MOA: optimizing inference for diverse software development tasks](https://arxiv.org/abs/2407.18521) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/moa.py)
399-
- [Patched RTC: evaluating LLMs for diverse software development tasks](https://arxiv.org/abs/2407.16557) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/rto.py)
392+
- [CePO: Empowering Llama with Reasoning using Test-Time Compute](https://cerebras.ai/blog/cepo) - [Implementation](optillm/cepo)
393+
- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](optillm/plugins/coc_plugin.py)
394+
- [Entropy Based Sampling and Parallel CoT Decoding](https://github.com/xjdr-alt/entropix) - [Implementation](optillm/entropy_decoding.py)
395+
- [Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation](https://arxiv.org/abs/2409.12941) - [Evaluation script](scripts/eval_frames_benchmark.py)
396+
- [Writing in the Margins: Better Inference Pattern for Long Context Retrieval](https://www.arxiv.org/abs/2408.14906) - [Inspired the implementation of the memory plugin](optillm/plugins/memory_plugin.py)
397+
- [Chain-of-Thought Reasoning Without Prompting](https://arxiv.org/abs/2402.10200) - [Implementation](optillm/cot_decoding.py)
398+
- [Re-Reading Improves Reasoning in Large Language Models](https://arxiv.org/abs/2309.06275) - [Implementation](optillm/reread.py)
399+
- [In-Context Principle Learning from Mistakes](https://arxiv.org/abs/2402.05403) - [Implementation](optillm/leap.py)
400+
- [Planning In Natural Language Improves LLM Search For Code Generation](https://arxiv.org/abs/2409.03733) - [Implementation](optillm/plansearch.py)
401+
- [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) - [Implementation](optillm/self_consistency.py)
402+
- [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://arxiv.org/abs/2408.06195) - [Implementation](optillm/rstar.py)
403+
- [Mixture-of-Agents Enhances Large Language Model Capabilities](https://arxiv.org/abs/2406.04692) - [Inspired the implementation of moa](optillm/moa.py)
404+
- [Prover-Verifier Games improve legibility of LLM outputs](https://arxiv.org/abs/2407.13692) - [Implementation](optillm/pvg.py)
405+
- [Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning](https://arxiv.org/abs/2405.00451) - [Inspired the implementation of mcts](optillm/mcts.py)
406+
- [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https://arxiv.org/abs/2402.08699) - [Inspired the implementation of rto](optillm/rto.py)
407+
- [Patched MOA: optimizing inference for diverse software development tasks](https://arxiv.org/abs/2407.18521) - [Implementation](optillm/moa.py)
408+
- [Patched RTC: evaluating LLMs for diverse software development tasks](https://arxiv.org/abs/2407.16557) - [Implementation](ptillm/rto.py)
400409

401410
## Citation
402411

optillm.py

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,26 @@ def load_plugins():
214214
if not plugin_approaches:
215215
logger.warning("No plugins loaded from any location")
216216

217+
def get_config_path():
218+
# Get installed package config directory
219+
import optillm
220+
package_config_dir = os.path.join(os.path.dirname(optillm.__file__), 'cepo', 'configs')
221+
package_config_path = os.path.join(package_config_dir, 'cepo_config.yaml')
222+
223+
# Get local project config directory
224+
current_dir = os.getcwd() if server_config.get("config_dir", "") == "" else server_config["config_dir"]
225+
local_config_dir = os.path.join(current_dir, 'optillm', 'cepo', 'configs')
226+
local_config_path = os.path.join(local_config_dir, 'cepo_config.yaml')
227+
228+
# If local config exists and is different from package config, use local
229+
if os.path.exists(local_config_path) and local_config_path != package_config_path:
230+
logger.debug(f"Using local config from: {local_config_path}")
231+
return local_config_path
232+
233+
# Otherwise use package config
234+
logger.debug(f"Using package config from: {package_config_path}")
235+
return package_config_path
236+
217237
def parse_combined_approach(model: str, known_approaches: list, plugin_approaches: dict):
218238
if model == 'auto':
219239
return 'SINGLE', ['none'], model
@@ -701,13 +721,24 @@ def parse_args():
701721
base_url_default = os.environ.get("OPTILLM_BASE_URL", "")
702722
parser.add_argument("--base-url", "--base_url", dest="base_url", type=str, default=base_url_default,
703723
help="Base url for OpenAI compatible endpoint")
724+
725+
# Use the function to get the default path
726+
default_config_path = get_config_path()
704727

705728
# Special handling of all the CePO Configurations
706729
for field in fields(CepoConfig):
707-
parser.add_argument(f"--cepo_{field.name}", dest=f"cepo_{field.name}", type=field.type, default=None, help=f"CePO configuration for {field.name}")
708-
709-
parser.add_argument(f"--cepo_config_file", dest=f"cepo_config_file", type=str, default="./optillm/cepo/configs/cepo_config.yaml", help="Path to CePO configuration file")
710-
730+
parser.add_argument(f"--cepo_{field.name}",
731+
dest=f"cepo_{field.name}",
732+
type=field.type,
733+
default=None,
734+
help=f"CePO configuration for {field.name}")
735+
736+
parser.add_argument("--cepo_config_file",
737+
dest="cepo_config_file",
738+
type=str,
739+
default=default_config_path,
740+
help="Path to CePO configuration file")
741+
711742
args = parser.parse_args()
712743

713744
# Convert argument names to match server_config keys

optillm/cepo/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
CePO is an inference-time computation method designed to enhance the accuracy of large language models (LLMs) on tasks requiring reasoning and planning, such as solving math or coding problems. It integrates several advanced techniques, including Best of N, Chain of Thought (CoT), Self-Reflection, Self-Improvement, and Prompt Engineering.
44

5-
If you have any questions or want to contribute, please reach out to us on [cerebras.ai/discord](cerebras.ai/discord)
5+
If you have any questions or want to contribute, please reach out to us on [cerebras.ai/discord](https://cerebras.ai/discord)
66

77
## CePO Methodology
88

@@ -41,4 +41,4 @@ Interestingly, the self-critique and quality improvement capabilities of existin
4141
| 3 | 5 | 8 | absolute | 69.4 | 84.3 | 55.6 | 81.1 | |
4242
| 5 | 3 | 6 | absolute | 68.7 | 85.4 | 54.8 | 79.9 | |
4343
| 7 | 3 | 6 | absolute | 69.6 | 82.8 | 54.7 | 78.4 | |
44-
| 9 | 3 | 6 | absolute | 68.9 | 83.4 | 55.7 | 80.6 | |
44+
| 9 | 3 | 6 | absolute | 68.9 | 83.4 | 55.7 | 80.6 | |

setup.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,12 @@
22

33
setup(
44
name="optillm",
5-
version="0.0.36",
5+
version="0.1.0",
66
packages=find_packages(),
77
py_modules=['optillm'],
88
package_data={
99
'optillm': ['plugins/*.py'], # Include plugin files
10+
'optillm': ['cepo/configs/*.yaml'], # Include yaml files in the package
1011
},
1112
include_package_data=True, # This is important
1213
install_requires=[
@@ -36,6 +37,7 @@
3637
"gradio",
3738
# Constrain spacy version to avoid blis build issues on ARM64
3839
"spacy<3.8.0",
40+
"cerebras_cloud_sdk",
3941
],
4042
entry_points={
4143
'console_scripts': [

test.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,20 @@
77
import logging
88
from openai import OpenAI
99

10-
from litellm_wrapper import LiteLLMWrapper
10+
from optillm.litellm_wrapper import LiteLLMWrapper
1111
from optillm.mcts import chat_with_mcts
1212
from optillm.bon import best_of_n_sampling
1313
from optillm.moa import mixture_of_agents
1414
from optillm.rto import round_trip_optimization
1515
from optillm.self_consistency import advanced_self_consistency_approach
1616
from optillm.pvg import inference_time_pv_game
17-
from optillm.z3_solver import Z3SolverSystem
17+
from optillm.z3_solver import Z3SymPySolverSystem
1818
from optillm.rstar import RStar
1919
from optillm.cot_reflection import cot_reflection
2020
from optillm.plansearch import plansearch
2121
from optillm.leap import leap
2222
from optillm.reread import re2_approach
23+
from optillm.cepo.cepo import cepo, CepoConfig, init_cepo_config
2324

2425
# Setup logging
2526
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
@@ -44,12 +45,13 @@ def __init__(self):
4445
'rto': round_trip_optimization,
4546
'self_consistency': advanced_self_consistency_approach,
4647
'pvg': inference_time_pv_game,
47-
'z3': lambda s, q, c, m: Z3SolverSystem(s, c, m).process_query(q),
48+
'z3': lambda s, q, c, m: Z3SymPySolverSystem(s, c, m).process_query(q),
4849
'rstar': lambda s, q, c, m: RStar(s, c, m).solve(q),
4950
'cot_reflection': cot_reflection,
5051
'plansearch': plansearch,
5152
'leap': leap,
5253
're2': re2_approach,
54+
'cepo': lambda s, q, c, m: cepo(s,q,c,m,init_cepo_config({'cepo_config_file': './optillm/cepo/configs/cepo_config.yaml'})),
5355
}
5456

5557
def load_test_cases(file_path: str) -> List[Dict]:

0 commit comments

Comments
 (0)