Skip to content

Commit 6024935

Browse files
committed
Update README.md
1 parent 1ed8867 commit 6024935

File tree

1 file changed

+105
-56
lines changed

1 file changed

+105
-56
lines changed

examples/web_scraper_optillm/README.md

Lines changed: 105 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ python optillm.py --port 8000
6464

6565
optillm will now be running on `http://localhost:8000` with its built-in local inference server.
6666

67-
**Note for Non-Mac Users**: This example uses `Qwen/Qwen3-0.6B-MLX-bf16` which is optimized for Apple Silicon (M1/M2/M3 chips). If you're not using a Mac, you should:
67+
**Note for Non-Mac Users**: This example uses `Qwen/Qwen3-1.7B-MLX-bf16` which is optimized for Apple Silicon (M1/M2/M3 chips). If you're not using a Mac, you should:
6868

6969
1. **For NVIDIA GPUs**: Use a CUDA-compatible model like:
7070
- `Qwen/Qwen2.5-32B-Instruct` (best quality, high VRAM)
@@ -81,9 +81,9 @@ optillm will now be running on `http://localhost:8000` with its built-in local i
8181
```yaml
8282
models:
8383
- name: "readurls-your-chosen-model"
84-
weight: 0.6
84+
weight: 0.9
8585
- name: "moa&readurls-your-chosen-model"
86-
weight: 0.4
86+
weight: 0.1
8787
```
8888
8989
### 2. Install Web Scraping Dependencies
@@ -105,8 +105,8 @@ python openevolve-run.py examples/web_scraper_optillm/initial_program.py \
105105
```
106106

107107
The configuration demonstrates both optillm capabilities:
108-
- **Primary model (90%)**: `readurls-Qwen/Qwen3-0.6B-MLX-bf16` - fetches URLs mentioned in prompts
109-
- **Secondary model (10%)**: `moa&readurls-Qwen/Qwen3-0.6B-MLX-bf16` - uses Mixture of Agents for improved accuracy
108+
- **Primary model (90%)**: `readurls-Qwen/Qwen3-1.7B-MLX-bf16` - fetches URLs mentioned in prompts
109+
- **Secondary model (10%)**: `moa&readurls-Qwen/Qwen3-1.7B-MLX-bf16` - uses Mixture of Agents for improved accuracy
110110

111111
## How It Works
112112

@@ -141,44 +141,54 @@ This is particularly valuable for complex parsing logic where multiple approache
141141
- Generating multiple parsing strategies (via MoA)
142142
- Learning from evaluation feedback
143143

144-
## Example Evolution Trajectory
144+
## Actual Evolution Results
145145

146-
**Generation 1** (Basic scraper):
147-
```python
148-
# Simple text extraction
149-
soup = BeautifulSoup(html, 'html.parser')
150-
text = soup.get_text()
151-
```
146+
Based on our evolution run, here's what we achieved:
147+
148+
### Performance Metrics
149+
- **Initial Score**: 0.6864 (72.2% accuracy, 32.5% completeness)
150+
- **Final Score**: 0.7458 (83.3% accuracy, 37.5% completeness)
151+
- **Improvement**: +8.6% overall performance (+11.1% accuracy)
152+
- **Time to Best**: Found optimal solution by iteration 3 (within 10 minutes)
153+
154+
### Key Evolution Improvements
152155

153-
**Generation 10** (With readurls context):
156+
**Initial Program** (Basic approach):
154157
```python
155-
# Targets specific documentation structures
156-
functions = soup.find_all('dl', class_='function')
157-
for func in functions:
158-
name = func.find('dt').get('id')
159-
desc = func.find('dd').text
158+
# Simple code block parsing
159+
code_blocks = soup.find_all('code')
160+
for block in code_blocks:
161+
text = block.get_text(strip=True)
162+
if '(' in text and ')' in text:
163+
# Extract function info
160164
```
161165

162-
**Generation 50** (With MoA refinement):
166+
**Evolved Program** (Sophisticated multi-strategy parsing):
163167
```python
164-
# Robust parsing with error handling
165-
def extract_function_docs(soup):
166-
# Multiple strategies for different doc formats
167-
strategies = [
168-
lambda: soup.select('dl.function dt'),
169-
lambda: soup.select('.sig-name'),
170-
lambda: soup.find_all('code', class_='descname')
171-
]
172-
173-
for strategy in strategies:
174-
try:
175-
results = strategy()
176-
if results:
177-
return parse_results(results)
178-
except:
179-
continue
168+
# 1. Code blocks
169+
code_blocks = soup.find_all('code')
170+
# 2. Headers (h3)
171+
h3_blocks = soup.find_all('h3')
172+
# 3. Documentation signatures
173+
dt_blocks = soup.find_all('dt', class_='sig')
174+
# 4. Table-based documentation (NEW!)
175+
table_blocks = soup.find_all('table')
176+
for block in table_blocks:
177+
rows = block.find_all('tr')
178+
for row in rows:
179+
cells = row.find_all('td')
180+
if len(cells) >= 2:
181+
signature = cells[0].get_text(strip=True)
182+
description = cells[1].get_text(strip=True)
183+
# Extract structured function data
180184
```
181185

186+
### What optillm Contributed
187+
188+
1. **Early Discovery**: Found best solution by iteration 3, suggesting enhanced reasoning helped quickly identify effective parsing strategies
189+
2. **Table Parsing Innovation**: The evolved program added sophisticated table parsing logic that wasn't in the initial version
190+
3. **Robust Architecture**: Multiple fallback strategies ensure the scraper works across different documentation formats
191+
182192
## Monitoring Progress
183193

184194
Watch the evolution progress and see how optillm enhances the process:
@@ -194,46 +204,85 @@ Watch the evolution progress and see how optillm enhances the process:
194204
tail -f examples/web_scraper_optillm/openevolve_output/evolution.log
195205
```
196206

197-
## Results
207+
## Results Analysis
208+
209+
After 100 iterations of evolution, here's what we achieved:
210+
211+
### Quantitative Results
212+
- **Accuracy**: 72.2% → 83.3% (+11.1% improvement)
213+
- **Completeness**: 32.5% → 37.5% (+5% improvement)
214+
- **Robustness**: 100% (maintained - no parsing errors)
215+
- **Combined Score**: 0.6864 → 0.7458 (+8.6% improvement)
198216

199-
After evolution, you should see:
200-
1. **Improved Accuracy**: The scraper correctly handles various documentation formats
201-
2. **Better Error Handling**: Robust parsing that doesn't break on edge cases
202-
3. **Optimized Performance**: Efficient extraction strategies
217+
### Qualitative Improvements
218+
1. **Multi-Strategy Parsing**: Added table-based extraction for broader documentation format support
219+
2. **Robust Function Detection**: Improved pattern matching for function signatures
220+
3. **Better Parameter Extraction**: Enhanced parameter parsing from various HTML structures
221+
4. **Error Resilience**: Maintained 100% robustness with no parsing failures
203222

204-
Compare the checkpoints to see the evolution:
223+
### Evolution Pattern
224+
- **Early Success**: Best solution found by iteration 3 (within 10 minutes)
225+
- **Plateau Effect**: Algorithm maintained optimal score from iteration 3-90
226+
- **Island Migration**: MAP-Elites explored alternatives but local optimum was strong
227+
228+
Compare the evolution:
205229
```bash
206-
# Initial vs evolved program
207-
diff examples/web_scraper_optillm/openevolve_output/checkpoints/checkpoint_10/best_program.py \
208-
examples/web_scraper_optillm/openevolve_output/checkpoints/checkpoint_100/best_program.py
230+
# View the final evolved program
231+
cat examples/web_scraper_optillm/openevolve_output/best/best_program.py
232+
233+
# Compare initial vs final
234+
diff examples/web_scraper_optillm/initial_program.py \
235+
examples/web_scraper_optillm/openevolve_output/best/best_program.py
209236
```
210237

211-
## Key Insights
238+
## Key Insights from This Run
239+
240+
1. **optillm Enhanced Early Discovery**: The best solution was found by iteration 3, suggesting optillm's test-time compute (MoA) and documentation access (readurls) helped quickly identify effective parsing strategies.
241+
242+
2. **Smaller Models Can Excel**: The 1.7B Qwen model with optillm achieved significant improvements (+8.6%), proving that test-time compute can make smaller models highly effective.
212243

213-
1. **Documentation Access Matters**: The readurls plugin significantly improves the LLM's ability to generate correct parsing code by providing actual HTML structure
244+
3. **Local Optimization Works**: Fast inference times (<100ms after initial) show that local models with optillm provide both efficiency and quality.
214245

215-
2. **Test-Time Compute Works**: MoA's multiple generation and critique approach produces more robust solutions than single-shot generation
246+
4. **Pattern: Quick Discovery, Then Plateau**: Evolution found a strong local optimum quickly. This suggests the current test cases were well-solved by the table parsing innovation.
216247

217-
3. **Powerful Local Models**: Large models like Qwen-32B with 4-bit quantization provide excellent results while being memory efficient when enhanced with optillm techniques
248+
5. **optillm Plugin Value**: The evolved program's sophisticated multi-strategy approach (especially table parsing) likely benefited from optillm's enhanced reasoning capabilities.
218249

219-
## Customization
250+
## Available optillm Plugins and Techniques
220251

221-
You can experiment with different optillm features by modifying `config.yaml`:
252+
optillm offers many plugins and optimization techniques. Here are the most useful for code evolution:
222253

223-
1. **Different Plugins**: Try the `executecode` plugin for runtime validation
224-
2. **Other Techniques**: Experiment with `cot_reflection`, `rstar`, or `bon`
225-
3. **Model Combinations**: Adjust weights or try different technique combinations
254+
### Core Plugins
255+
- **`readurls`**: Automatically fetches web content when URLs are detected in prompts
256+
- **`executecode`**: Runs code and includes output in the response (great for validation)
257+
258+
### Optimization Techniques
259+
- **`moa`** (Mixture of Agents): Generates multiple responses, critiques them, and synthesizes the best
260+
- **`cot_reflection`**: Uses chain-of-thought reasoning with self-reflection
261+
- **`rstar`**: Advanced reasoning technique for complex problems
262+
- **`bon`** (Best of N): Generates N responses and selects the best one
263+
- **`z3_solver`**: Uses Z3 theorem prover for logical reasoning
264+
- **`rto`** (Round Trip Optimization): Optimizes responses through iterative refinement
265+
266+
### Combining Techniques
267+
You can chain multiple techniques using `&`:
226268

227-
Example custom configuration:
228269
```yaml
229270
llm:
230271
models:
231-
- name: "cot_reflection&readurls-Qwen/Qwen3-0.6B-MLX-bf16"
272+
# Use chain-of-thought + readurls for primary model
273+
- name: "cot_reflection&readurls-Qwen/Qwen3-1.7B-MLX-bf16"
232274
weight: 0.7
233-
- name: "moa&executecode-Qwen/Qwen3-0.6B-MLX-bf16"
275+
# Use MoA + code execution for secondary validation
276+
- name: "moa&executecode-Qwen/Qwen3-1.7B-MLX-bf16"
234277
weight: 0.3
235278
```
236279
280+
### Recommended Combinations for Code Evolution
281+
1. **For Documentation-Heavy Tasks**: `cot_reflection&readurls`
282+
2. **For Complex Logic**: `moa&executecode`
283+
3. **For Mathematical Problems**: `cot_reflection&z3_solver`
284+
4. **For Validation-Critical Code**: `bon&executecode`
285+
237286
## Troubleshooting
238287

239288
1. **optillm not responding**: Ensure it's running on port 8000 with `OPTILLM_API_KEY=optillm`

0 commit comments

Comments
 (0)