Merge pull request #153 from codelion/feat-new-release-cepo

codelion · web-flow · commit 6663c8f4c56f · 2025-01-24T10:56:38.000+08:00
The CePO approach
diff --git a/README.md b/README.md
@@ -1,6 +1,8 @@
 # optillm
 
-optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs. The current focus is on implementing techniques that improve reasoning over coding, logical and mathematical queries. It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time.
+optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs. The current focus is on implementing techniques that improve reasoning over coding, logical and mathematical queries. 
+
+It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the [CePO approach](optillm/cepo) from Cerebras.
 
 [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/codelion/optillm)
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing)
@@ -46,9 +48,16 @@ source .venv/bin/activate
 pip install -r requirements.txt
 ```
 
-Set up the `OPENAI_API_KEY` environment variable (for OpenAI) 
-or the `AZURE_OPENAI_API_KEY`, `AZURE_API_VERSION` and `AZURE_API_BASE` environment variables (for Azure OpenAI)
-or the `AZURE_API_VERSION` and `AZURE_API_BASE` environment variables and login using `az login` for Azure OpenAI with managed identity (see [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity)).
+We support all major LLM providers and models for inference. You need to set the correct environment variable and the proxy will pick the corresponding client.
+
+| Provider | Required Environment Variables | Additional Notes |
+|----------|-------------------------------|------------------|
+| OptiLLM | `OPTILLM_API_KEY` | Uses the inbuilt local server for inference, supports logprobs and decoding techniques like `cot_decoding` & `entropy_decoding` | 
+| OpenAI | `OPENAI_API_KEY` | You can use this with any OpenAI compatible endpoint (e.g. OpenRouter) by setting the `base_url` |
+| Cerebras | `CEREBRAS_API_KEY` | You can use this for fast inference with supported models, see [docs for details](https://inference-docs.cerebras.ai/introduction) |
+| Azure OpenAI | `AZURE_OPENAI_API_KEY`<br>`AZURE_API_VERSION`<br>`AZURE_API_BASE` | - |
+| Azure OpenAI (Managed Identity) | `AZURE_API_VERSION`<br>`AZURE_API_BASE` | Login required using `az login`, see [docs for details](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity) |
+| LiteLLM | depends on the model | See [docs for details](https://docs.litellm.ai/docs/providers) |
 
 You can then run the optillm proxy as follows.
 
@@ -325,7 +334,7 @@ Authorization: Bearer your_secret_api_key
 
 ## SOTA results on benchmarks with optillm
 
-### CePO on math and code benchmarks
+### CePO on math and code benchmarks (Jan 2025)
 
 | Method                     | Math-L5 | MMLU-Pro (Math) | GPQA | CRUX | LiveCodeBench (pass@1) | Simple QA |
 | -------------------------: | :-----: | :-------------: | :--: | :--: | :--------------------: | :-------: |
@@ -380,23 +389,23 @@ called patchflows. We saw huge performance gains across all the supported patchf
 ![Results showing optillm mixture of agents approach used with patchflows](https://raw.githubusercontent.com/codelion/optillm/main/moa-patchwork-results.png)
 
 ## References
-
-- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](https://github.com/codelion/optillm/blob/main/optillm/plugins/coc_plugin.py)
-- [Entropy Based Sampling and Parallel CoT Decoding](https://github.com/xjdr-alt/entropix) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/entropy_decoding.py)
-- [Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation](https://arxiv.org/abs/2409.12941) - [Evaluation script](https://github.com/codelion/optillm/blob/main/scripts/eval_frames_benchmark.py)
-- [Writing in the Margins: Better Inference Pattern for Long Context Retrieval](https://www.arxiv.org/abs/2408.14906) - [Inspired the implementation of the memory plugin](https://github.com/codelion/optillm/blob/main/optillm/plugins/memory_plugin.py)
-- [Chain-of-Thought Reasoning Without Prompting](https://arxiv.org/abs/2402.10200) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/cot_decoding.py)
-- [Re-Reading Improves Reasoning in Large Language Models](https://arxiv.org/abs/2309.06275) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/reread.py)
-- [In-Context Principle Learning from Mistakes](https://arxiv.org/abs/2402.05403) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/leap.py)
-- [Planning In Natural Language Improves LLM Search For Code Generation](https://arxiv.org/abs/2409.03733) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/plansearch.py)
-- [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/self_consistency.py)
-- [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://arxiv.org/abs/2408.06195) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/rstar.py)
-- [Mixture-of-Agents Enhances Large Language Model Capabilities](https://arxiv.org/abs/2406.04692) - [Inspired the implementation of moa](https://github.com/codelion/optillm/blob/main/optillm/moa.py)
-- [Prover-Verifier Games improve legibility of LLM outputs](https://arxiv.org/abs/2407.13692) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/pvg.py)
-- [Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning](https://arxiv.org/abs/2405.00451) - [Inspired the implementation of mcts](https://github.com/codelion/optillm/blob/main/optillm/mcts.py)
-- [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https://arxiv.org/abs/2402.08699) - [Inspired the implementation of rto](https://github.com/codelion/optillm/blob/main/optillm/rto.py)
-- [Patched MOA: optimizing inference for diverse software development tasks](https://arxiv.org/abs/2407.18521) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/moa.py)
-- [Patched RTC: evaluating LLMs for diverse software development tasks](https://arxiv.org/abs/2407.16557) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/rto.py)
+- [CePO: Empowering Llama with Reasoning using Test-Time Compute](https://cerebras.ai/blog/cepo) - [Implementation](optillm/cepo)
+- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](optillm/plugins/coc_plugin.py)
+- [Entropy Based Sampling and Parallel CoT Decoding](https://github.com/xjdr-alt/entropix) - [Implementation](optillm/entropy_decoding.py)
+- [Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation](https://arxiv.org/abs/2409.12941) - [Evaluation script](scripts/eval_frames_benchmark.py)
+- [Writing in the Margins: Better Inference Pattern for Long Context Retrieval](https://www.arxiv.org/abs/2408.14906) - [Inspired the implementation of the memory plugin](optillm/plugins/memory_plugin.py)
+- [Chain-of-Thought Reasoning Without Prompting](https://arxiv.org/abs/2402.10200) - [Implementation](optillm/cot_decoding.py)
+- [Re-Reading Improves Reasoning in Large Language Models](https://arxiv.org/abs/2309.06275) - [Implementation](optillm/reread.py)
+- [In-Context Principle Learning from Mistakes](https://arxiv.org/abs/2402.05403) - [Implementation](optillm/leap.py)
+- [Planning In Natural Language Improves LLM Search For Code Generation](https://arxiv.org/abs/2409.03733) - [Implementation](optillm/plansearch.py)
+- [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) - [Implementation](optillm/self_consistency.py)
+- [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://arxiv.org/abs/2408.06195) - [Implementation](optillm/rstar.py)
+- [Mixture-of-Agents Enhances Large Language Model Capabilities](https://arxiv.org/abs/2406.04692) - [Inspired the implementation of moa](optillm/moa.py)
+- [Prover-Verifier Games improve legibility of LLM outputs](https://arxiv.org/abs/2407.13692) - [Implementation](optillm/pvg.py)
+- [Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning](https://arxiv.org/abs/2405.00451) - [Inspired the implementation of mcts](optillm/mcts.py)
+- [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https://arxiv.org/abs/2402.08699) - [Inspired the implementation of rto](optillm/rto.py)
+- [Patched MOA: optimizing inference for diverse software development tasks](https://arxiv.org/abs/2407.18521) - [Implementation](optillm/moa.py)
+- [Patched RTC: evaluating LLMs for diverse software development tasks](https://arxiv.org/abs/2407.16557) - [Implementation](ptillm/rto.py)
 
 ## Citation
 
diff --git a/optillm.py b/optillm.py
@@ -214,6 +214,26 @@ def load_plugins():
    if not plugin_approaches:
        logger.warning("No plugins loaded from any location")
 
+def get_config_path():
+    # Get installed package config directory
+    import optillm
+    package_config_dir = os.path.join(os.path.dirname(optillm.__file__), 'cepo', 'configs')
+    package_config_path = os.path.join(package_config_dir, 'cepo_config.yaml')
+    
+    # Get local project config directory
+    current_dir = os.getcwd() if server_config.get("config_dir", "") == "" else server_config["config_dir"]
+    local_config_dir = os.path.join(current_dir, 'optillm', 'cepo', 'configs')
+    local_config_path = os.path.join(local_config_dir, 'cepo_config.yaml')
+    
+    # If local config exists and is different from package config, use local
+    if os.path.exists(local_config_path) and local_config_path != package_config_path:
+        logger.debug(f"Using local config from: {local_config_path}")
+        return local_config_path
+    
+    # Otherwise use package config
+    logger.debug(f"Using package config from: {package_config_path}")
+    return package_config_path
+
 def parse_combined_approach(model: str, known_approaches: list, plugin_approaches: dict):
     if model == 'auto':
         return 'SINGLE', ['none'], model
@@ -701,13 +721,24 @@ def parse_args():
     base_url_default = os.environ.get("OPTILLM_BASE_URL", "")
     parser.add_argument("--base-url", "--base_url", dest="base_url", type=str, default=base_url_default,
                         help="Base url for OpenAI compatible endpoint")
+    
+    # Use the function to get the default path
+    default_config_path = get_config_path()
 
     # Special handling of all the CePO Configurations
     for field in fields(CepoConfig):
-        parser.add_argument(f"--cepo_{field.name}", dest=f"cepo_{field.name}", type=field.type, default=None, help=f"CePO configuration for {field.name}")
-
-    parser.add_argument(f"--cepo_config_file", dest=f"cepo_config_file", type=str, default="./optillm/cepo/configs/cepo_config.yaml", help="Path to CePO configuration file")
-
+        parser.add_argument(f"--cepo_{field.name}", 
+                        dest=f"cepo_{field.name}", 
+                        type=field.type, 
+                        default=None, 
+                        help=f"CePO configuration for {field.name}")
+
+    parser.add_argument("--cepo_config_file", 
+                    dest="cepo_config_file", 
+                    type=str, 
+                    default=default_config_path,
+                    help="Path to CePO configuration file")
+    
     args = parser.parse_args()
 
     # Convert argument names to match server_config keys
diff --git a/optillm/cepo/README.md b/optillm/cepo/README.md
@@ -2,7 +2,7 @@
 
 CePO is an inference-time computation method designed to enhance the accuracy of large language models (LLMs) on tasks requiring reasoning and planning, such as solving math or coding problems. It integrates several advanced techniques, including Best of N, Chain of Thought (CoT), Self-Reflection, Self-Improvement, and Prompt Engineering.
 
-If you have any questions or want to contribute, please reach out to us on [cerebras.ai/discord](cerebras.ai/discord)
+If you have any questions or want to contribute, please reach out to us on [cerebras.ai/discord](https://cerebras.ai/discord)
 
 ## CePO Methodology
 
@@ -41,4 +41,4 @@ Interestingly, the self-critique and quality improvement capabilities of existin
 |     3     |      5     |      8     |       absolute      |  69.4   |      84.3       | 55.6  | 81.1  |                |
 |     5     |      3     |      6     |       absolute      |  68.7   |      85.4       | 54.8  | 79.9  |                |
 |     7     |      3     |      6     |       absolute      |  69.6   |      82.8       | 54.7  | 78.4  |                |
-|     9     |      3     |      6     |       absolute      |  68.9   |      83.4       | 55.7  | 80.6  |                |
+|     9     |      3     |      6     |       absolute      |  68.9   |      83.4       | 55.7  | 80.6  |                |
diff --git a/setup.py b/setup.py
@@ -2,11 +2,12 @@
 
 setup(
     name="optillm",
-    version="0.0.36",
+    version="0.1.0",
     packages=find_packages(),
     py_modules=['optillm'],
     package_data={
         'optillm': ['plugins/*.py'],  # Include plugin files
+        'optillm': ['cepo/configs/*.yaml'],  # Include yaml files in the package
     },
     include_package_data=True,  # This is important
     install_requires=[
@@ -36,6 +37,7 @@
         "gradio",
         # Constrain spacy version to avoid blis build issues on ARM64
         "spacy<3.8.0",
+        "cerebras_cloud_sdk",
     ],
     entry_points={
         'console_scripts': [
diff --git a/test.py b/test.py
@@ -7,19 +7,20 @@
 import logging
 from openai import OpenAI
 
-from litellm_wrapper import LiteLLMWrapper
+from optillm.litellm_wrapper import LiteLLMWrapper
 from optillm.mcts import chat_with_mcts
 from optillm.bon import best_of_n_sampling
 from optillm.moa import mixture_of_agents
 from optillm.rto import round_trip_optimization
 from optillm.self_consistency import advanced_self_consistency_approach
 from optillm.pvg import inference_time_pv_game
-from optillm.z3_solver import Z3SolverSystem
+from optillm.z3_solver import Z3SymPySolverSystem
 from optillm.rstar import RStar
 from optillm.cot_reflection import cot_reflection
 from optillm.plansearch import plansearch
 from optillm.leap import leap
 from optillm.reread import re2_approach
+from optillm.cepo.cepo import cepo, CepoConfig, init_cepo_config
 
 # Setup logging
 logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
@@ -44,12 +45,13 @@ def __init__(self):
     'rto': round_trip_optimization,
     'self_consistency': advanced_self_consistency_approach,
     'pvg': inference_time_pv_game,
-    'z3': lambda s, q, c, m: Z3SolverSystem(s, c, m).process_query(q),
+    'z3': lambda s, q, c, m: Z3SymPySolverSystem(s, c, m).process_query(q),
     'rstar': lambda s, q, c, m: RStar(s, c, m).solve(q),
     'cot_reflection': cot_reflection,
     'plansearch': plansearch,
     'leap': leap,
     're2': re2_approach,
+    'cepo': lambda s, q, c, m: cepo(s,q,c,m,init_cepo_config({'cepo_config_file': './optillm/cepo/configs/cepo_config.yaml'})),
 }
 
 def load_test_cases(file_path: str) -> List[Dict]:
diff --git a/test_results.json b/test_results.json