Update README.md

codelion · codelion · commit 6024935f075f · 2025-07-08T15:45:03.000+08:00
diff --git a/examples/web_scraper_optillm/README.md b/examples/web_scraper_optillm/README.md
@@ -64,7 +64,7 @@ python optillm.py --port 8000
 
 optillm will now be running on `http://localhost:8000` with its built-in local inference server.
 
-**Note for Non-Mac Users**: This example uses `Qwen/Qwen3-0.6B-MLX-bf16` which is optimized for Apple Silicon (M1/M2/M3 chips). If you're not using a Mac, you should:
+**Note for Non-Mac Users**: This example uses `Qwen/Qwen3-1.7B-MLX-bf16` which is optimized for Apple Silicon (M1/M2/M3 chips). If you're not using a Mac, you should:
 
 1. **For NVIDIA GPUs**: Use a CUDA-compatible model like:
    - `Qwen/Qwen2.5-32B-Instruct` (best quality, high VRAM)
@@ -81,9 +81,9 @@ optillm will now be running on `http://localhost:8000` with its built-in local i
    ```yaml
    models:
      - name: "readurls-your-chosen-model"
-       weight: 0.6
+       weight: 0.9
      - name: "moa&readurls-your-chosen-model"
-       weight: 0.4
+       weight: 0.1
    ```
 
 ### 2. Install Web Scraping Dependencies
@@ -105,8 +105,8 @@ python openevolve-run.py examples/web_scraper_optillm/initial_program.py \
 ```
 
 The configuration demonstrates both optillm capabilities:
-- **Primary model (90%)**: `readurls-Qwen/Qwen3-0.6B-MLX-bf16` - fetches URLs mentioned in prompts
-- **Secondary model (10%)**: `moa&readurls-Qwen/Qwen3-0.6B-MLX-bf16` - uses Mixture of Agents for improved accuracy
+- **Primary model (90%)**: `readurls-Qwen/Qwen3-1.7B-MLX-bf16` - fetches URLs mentioned in prompts
+- **Secondary model (10%)**: `moa&readurls-Qwen/Qwen3-1.7B-MLX-bf16` - uses Mixture of Agents for improved accuracy
 
 ## How It Works
 
@@ -141,44 +141,54 @@ This is particularly valuable for complex parsing logic where multiple approache
    - Generating multiple parsing strategies (via MoA)
    - Learning from evaluation feedback
 
-## Example Evolution Trajectory
+## Actual Evolution Results
 
-**Generation 1** (Basic scraper):
-```python
-# Simple text extraction
-soup = BeautifulSoup(html, 'html.parser')
-text = soup.get_text()
-```
+Based on our evolution run, here's what we achieved:
+
+### Performance Metrics
+- **Initial Score**: 0.6864 (72.2% accuracy, 32.5% completeness)
+- **Final Score**: 0.7458 (83.3% accuracy, 37.5% completeness)
+- **Improvement**: +8.6% overall performance (+11.1% accuracy)
+- **Time to Best**: Found optimal solution by iteration 3 (within 10 minutes)
+
+### Key Evolution Improvements
 
-**Generation 10** (With readurls context):
+**Initial Program** (Basic approach):
 ```python
-# Targets specific documentation structures
-functions = soup.find_all('dl', class_='function')
-for func in functions:
-    name = func.find('dt').get('id')
-    desc = func.find('dd').text
+# Simple code block parsing
+code_blocks = soup.find_all('code')
+for block in code_blocks:
+    text = block.get_text(strip=True)
+    if '(' in text and ')' in text:
+        # Extract function info
 ```
 
-**Generation 50** (With MoA refinement):
+**Evolved Program** (Sophisticated multi-strategy parsing):
 ```python
-# Robust parsing with error handling
-def extract_function_docs(soup):
-    # Multiple strategies for different doc formats
-    strategies = [
-        lambda: soup.select('dl.function dt'),
-        lambda: soup.select('.sig-name'),
-        lambda: soup.find_all('code', class_='descname')
-    ]
-    
-    for strategy in strategies:
-        try:
-            results = strategy()
-            if results:
-                return parse_results(results)
-        except:
-            continue
+# 1. Code blocks
+code_blocks = soup.find_all('code')
+# 2. Headers (h3)
+h3_blocks = soup.find_all('h3')
+# 3. Documentation signatures
+dt_blocks = soup.find_all('dt', class_='sig')
+# 4. Table-based documentation (NEW!)
+table_blocks = soup.find_all('table')
+for block in table_blocks:
+    rows = block.find_all('tr')
+    for row in rows:
+        cells = row.find_all('td')
+        if len(cells) >= 2:
+            signature = cells[0].get_text(strip=True)
+            description = cells[1].get_text(strip=True)
+            # Extract structured function data
 ```
 
+### What optillm Contributed
+
+1. **Early Discovery**: Found best solution by iteration 3, suggesting enhanced reasoning helped quickly identify effective parsing strategies
+2. **Table Parsing Innovation**: The evolved program added sophisticated table parsing logic that wasn't in the initial version
+3. **Robust Architecture**: Multiple fallback strategies ensure the scraper works across different documentation formats
+
 ## Monitoring Progress
 
 Watch the evolution progress and see how optillm enhances the process:
@@ -194,46 +204,85 @@ Watch the evolution progress and see how optillm enhances the process:
 tail -f examples/web_scraper_optillm/openevolve_output/evolution.log
 ```
 
-## Results
+## Results Analysis
+
+After 100 iterations of evolution, here's what we achieved:
+
+### Quantitative Results
+- **Accuracy**: 72.2% → 83.3% (+11.1% improvement)
+- **Completeness**: 32.5% → 37.5% (+5% improvement) 
+- **Robustness**: 100% (maintained - no parsing errors)
+- **Combined Score**: 0.6864 → 0.7458 (+8.6% improvement)
 
-After evolution, you should see:
-1. **Improved Accuracy**: The scraper correctly handles various documentation formats
-2. **Better Error Handling**: Robust parsing that doesn't break on edge cases
-3. **Optimized Performance**: Efficient extraction strategies
+### Qualitative Improvements
+1. **Multi-Strategy Parsing**: Added table-based extraction for broader documentation format support
+2. **Robust Function Detection**: Improved pattern matching for function signatures
+3. **Better Parameter Extraction**: Enhanced parameter parsing from various HTML structures
+4. **Error Resilience**: Maintained 100% robustness with no parsing failures
 
-Compare the checkpoints to see the evolution:
+### Evolution Pattern
+- **Early Success**: Best solution found by iteration 3 (within 10 minutes)
+- **Plateau Effect**: Algorithm maintained optimal score from iteration 3-90
+- **Island Migration**: MAP-Elites explored alternatives but local optimum was strong
+
+Compare the evolution:
 ```bash
-# Initial vs evolved program
-diff examples/web_scraper_optillm/openevolve_output/checkpoints/checkpoint_10/best_program.py \
-     examples/web_scraper_optillm/openevolve_output/checkpoints/checkpoint_100/best_program.py
+# View the final evolved program
+cat examples/web_scraper_optillm/openevolve_output/best/best_program.py
+
+# Compare initial vs final
+diff examples/web_scraper_optillm/initial_program.py \
+     examples/web_scraper_optillm/openevolve_output/best/best_program.py
 ```
 
-## Key Insights
+## Key Insights from This Run
+
+1. **optillm Enhanced Early Discovery**: The best solution was found by iteration 3, suggesting optillm's test-time compute (MoA) and documentation access (readurls) helped quickly identify effective parsing strategies.
+
+2. **Smaller Models Can Excel**: The 1.7B Qwen model with optillm achieved significant improvements (+8.6%), proving that test-time compute can make smaller models highly effective.
 
-1. **Documentation Access Matters**: The readurls plugin significantly improves the LLM's ability to generate correct parsing code by providing actual HTML structure
+3. **Local Optimization Works**: Fast inference times (<100ms after initial) show that local models with optillm provide both efficiency and quality.
 
-2. **Test-Time Compute Works**: MoA's multiple generation and critique approach produces more robust solutions than single-shot generation
+4. **Pattern: Quick Discovery, Then Plateau**: Evolution found a strong local optimum quickly. This suggests the current test cases were well-solved by the table parsing innovation.
 
-3. **Powerful Local Models**: Large models like Qwen-32B with 4-bit quantization provide excellent results while being memory efficient when enhanced with optillm techniques
+5. **optillm Plugin Value**: The evolved program's sophisticated multi-strategy approach (especially table parsing) likely benefited from optillm's enhanced reasoning capabilities.
 
-## Customization
+## Available optillm Plugins and Techniques
 
-You can experiment with different optillm features by modifying `config.yaml`:
+optillm offers many plugins and optimization techniques. Here are the most useful for code evolution:
 
-1. **Different Plugins**: Try the `executecode` plugin for runtime validation
-2. **Other Techniques**: Experiment with `cot_reflection`, `rstar`, or `bon`
-3. **Model Combinations**: Adjust weights or try different technique combinations
+### Core Plugins
+- **`readurls`**: Automatically fetches web content when URLs are detected in prompts
+- **`executecode`**: Runs code and includes output in the response (great for validation)
+
+### Optimization Techniques
+- **`moa`** (Mixture of Agents): Generates multiple responses, critiques them, and synthesizes the best
+- **`cot_reflection`**: Uses chain-of-thought reasoning with self-reflection
+- **`rstar`**: Advanced reasoning technique for complex problems
+- **`bon`** (Best of N): Generates N responses and selects the best one
+- **`z3_solver`**: Uses Z3 theorem prover for logical reasoning
+- **`rto`** (Round Trip Optimization): Optimizes responses through iterative refinement
+
+### Combining Techniques
+You can chain multiple techniques using `&`:
 
-Example custom configuration:
 ```yaml
 llm:
   models:
-    - name: "cot_reflection&readurls-Qwen/Qwen3-0.6B-MLX-bf16"
+    # Use chain-of-thought + readurls for primary model
+    - name: "cot_reflection&readurls-Qwen/Qwen3-1.7B-MLX-bf16"
       weight: 0.7
-    - name: "moa&executecode-Qwen/Qwen3-0.6B-MLX-bf16"
+    # Use MoA + code execution for secondary validation
+    - name: "moa&executecode-Qwen/Qwen3-1.7B-MLX-bf16"
       weight: 0.3
 ```
 
+### Recommended Combinations for Code Evolution
+1. **For Documentation-Heavy Tasks**: `cot_reflection&readurls`
+2. **For Complex Logic**: `moa&executecode` 
+3. **For Mathematical Problems**: `cot_reflection&z3_solver`
+4. **For Validation-Critical Code**: `bon&executecode`
+
 ## Troubleshooting
 
 1. **optillm not responding**: Ensure it's running on port 8000 with `OPTILLM_API_KEY=optillm`