|
| 1 | +# Evolving Better Prompts with OpenEvolve 🧠✨ |
| 2 | + |
| 3 | +This example shows how to use **OpenEvolve** to automatically optimize prompts for **Large Language Models (LLMs)**. Whether you're working on classification, summarization, generation, or code tasks, OpenEvolve helps you find high-performing prompts using **evolutionary search**. For this example we'll use syntihetic data for sentiment analysis task, but you can adapt it to your own datasets and tasks. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## 🎯 What Is Prompt Optimization? |
| 8 | + |
| 9 | +Prompt engineering is key to getting reliable outputs from LLMs—but finding the right prompt manually can be slow and inconsistent. |
| 10 | + |
| 11 | +OpenEvolve automates this by: |
| 12 | + |
| 13 | +* Generating and evolving prompt variations |
| 14 | +* Testing them against your task and metrics |
| 15 | +* Selecting the best prompts through generations |
| 16 | + |
| 17 | +You start with a simple prompt and let OpenEvolve evolve it into something smarter and more effective. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## 🚀 Getting Started |
| 22 | + |
| 23 | +### 1. Install Dependencies |
| 24 | + |
| 25 | +```bash |
| 26 | +cd examples/llm_prompt_optimazation |
| 27 | +pip install -r requirements.txt |
| 28 | +sh run.sh |
| 29 | +``` |
| 30 | + |
| 31 | +### 2. Add Your models |
| 32 | + |
| 33 | +1. Update your `config.yaml`: |
| 34 | + |
| 35 | +```yaml |
| 36 | +llm: |
| 37 | + primary_model: "llm_name" |
| 38 | + api_base: "llm_server_url" |
| 39 | + api_key: "your_api_key_here" |
| 40 | +``` |
| 41 | +
|
| 42 | +2. Update your task-model in `evaluator.py`: |
| 43 | + |
| 44 | +```python |
| 45 | +TASK_MODEL_NAME = "task_llm_name" |
| 46 | +TASK_MODEL_URL = "task_llm_server_url" |
| 47 | +TASK_MODEL_API_KEY = "your_api_key_here" |
| 48 | +SAMPLE_SIZE = 25 # Number of samples to use for evaluation |
| 49 | +MAX_RETRIES = 3 # Number of retries for LLM calls |
| 50 | +
|
| 51 | +``` |
| 52 | + |
| 53 | +### 3. Run OpenEvolve |
| 54 | + |
| 55 | +```bash |
| 56 | +sh run.sh |
| 57 | +``` |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## 🔧 How to Adapt This Template |
| 62 | + |
| 63 | +### 1. Replace the Dataset |
| 64 | + |
| 65 | +Edit `data.json` to match your use case: |
| 66 | + |
| 67 | +```json |
| 68 | +[ |
| 69 | + { |
| 70 | + "id": 1, |
| 71 | + "input": "Your input here", |
| 72 | + "expected_output": "Target output" |
| 73 | + } |
| 74 | +] |
| 75 | +``` |
| 76 | + |
| 77 | +### 2. Customize the Evaluator |
| 78 | + |
| 79 | +In `evaluator.py`, define how to evaluate a prompt: |
| 80 | + |
| 81 | +* Load your data |
| 82 | +* Call the LLM using the prompt |
| 83 | +* Measure output quality (accuracy, score, etc.) |
| 84 | + |
| 85 | +### 3. Write Your Initial Prompt |
| 86 | + |
| 87 | +Create a basic starting prompt in `initial_prompt.txt`: |
| 88 | + |
| 89 | +``` |
| 90 | +# EVOLVE-BLOCK-START |
| 91 | +Your task prompt using {input_text} as a placeholder. |
| 92 | +# EVOLVE-BLOCK-END |
| 93 | +``` |
| 94 | + |
| 95 | +This is the part OpenEvolve will improve over time. |
| 96 | +Good to add the name of your task in 'initial_prompt.txt' header to help the model understand the context. |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## ⚙️ Key Config Options (`config.yaml`) |
| 101 | + |
| 102 | +```yaml |
| 103 | +llm: |
| 104 | + primary_model: "gpt-4o" # or your preferred model |
| 105 | + secondary_model: "gpt-3.5" # optional for diversity |
| 106 | + temperature: 0.9 |
| 107 | + max_tokens: 2048 |
| 108 | +
|
| 109 | +database: |
| 110 | + population_size: 40 |
| 111 | + max_iterations: 15 |
| 112 | + elite_selection_ratio: 0.25 |
| 113 | +
|
| 114 | +evaluator: |
| 115 | + timeout: 45 |
| 116 | + parallel_evaluations: 3 |
| 117 | + use_llm_feedback: true |
| 118 | +``` |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +## 📈 Example Output |
| 123 | + |
| 124 | +OpenEvolve evolves prompts like this: |
| 125 | + |
| 126 | +**Initial Prompt:** |
| 127 | + |
| 128 | +``` |
| 129 | +Please analyze the sentiment of the following sentence and provide a sentiment score: |
| 130 | + |
| 131 | +"{input_text}" |
| 132 | + |
| 133 | +Rate the sentiment on a scale from 0.0 to 10.0. |
| 134 | + |
| 135 | +Score: |
| 136 | +``` |
| 137 | +
|
| 138 | +**Evolved Prompt:** |
| 139 | +
|
| 140 | +``` |
| 141 | +Please analyze the sentiment of the following sentence and provide a sentiment score using the following guidelines: |
| 142 | +- 0.0-2.9: Strongly negative sentiment (e.g., expresses anger, sadness, or despair) |
| 143 | +- 3.0-6.9: Neutral or mixed sentiment (e.g., factual statements, ambiguous content) |
| 144 | +- 7.0-10.0: Strongly positive sentiment (e.g., expresses joy, satisfaction, or hope) |
| 145 | + |
| 146 | +"{input_text}" |
| 147 | + |
| 148 | +Rate the sentiment on a scale from 0.0 to 10.0: |
| 149 | +- 0.0-2.9: Strongly negative (e.g., "This product is terrible") |
| 150 | +- 3.0-6.9: Neutral/mixed (e.g., "The sky is blue today") |
| 151 | +- 7.0-10.0: Strongly positive (e.g., "This is amazing!") |
| 152 | + |
| 153 | +Provide only the numeric score (e.g., "8.5") without any additional text: |
| 154 | + |
| 155 | +Score: |
| 156 | +``` |
| 157 | +
|
| 158 | +**Result**: Improved accuracy and output consistency. |
| 159 | +
|
| 160 | +--- |
| 161 | +
|
| 162 | +## 🔍 Where to Use This |
| 163 | +
|
| 164 | +OpenEvolve could be addapted on many tasks: |
| 165 | +
|
| 166 | +* **Text Classification**: Spam detection, intent recognition |
| 167 | +* **Content Generation**: Social media posts, product descriptions |
| 168 | +* **Question Answering & Summarization** |
| 169 | +* **Code Tasks**: Review, generation, completion |
| 170 | +* **Structured Output**: JSON, table filling, data extraction |
| 171 | +
|
| 172 | +--- |
| 173 | +
|
| 174 | +## ✅ Best Practices |
| 175 | +
|
| 176 | +* Start with a basic but relevant prompt |
| 177 | +* Use good-quality data and clear evaluation metrics |
| 178 | +* Run multiple evolutions for better results |
| 179 | +* Validate on held-out data before deployment |
| 180 | +
|
| 181 | +--- |
| 182 | +
|
| 183 | +**Ready to discover better prompts?** |
| 184 | +Use this template to evolve prompts for any LLM task—automatically. |
0 commit comments