Skip to content

Commit 747c9ea

Browse files
committed
g
1 parent 1247bf2 commit 747c9ea

File tree

5 files changed

+1137
-507
lines changed

5 files changed

+1137
-507
lines changed

examples/mlx_metal_kernel_opt/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,59 @@ cd examples/mlx_metal_kernel_opt
278278
python run_benchmarks.py --mode compare # Compare standard vs optimized
279279
```
280280

281+
## 🧪 **NEW: Simple Testing Tools**
282+
283+
### **Quick Performance Testing**
284+
285+
We've added simple tools to easily test your optimized attention kernel:
286+
287+
#### **1. Verify Setup**
288+
```bash
289+
python verify_setup.py # Check dependencies and files
290+
```
291+
292+
#### **2. Quick Demo**
293+
```bash
294+
python quick_demo.py # Run demo with multiple test prompts
295+
```
296+
297+
#### **3. Custom Testing**
298+
```bash
299+
# Test with default best_program.py
300+
python test_optimized_attention.py
301+
302+
# Test with custom program
303+
python test_optimized_attention.py path/to/your/best_program.py
304+
305+
# Test with custom prompt
306+
python test_optimized_attention.py --prompt "Write a Python function:" --max-tokens 200
307+
```
308+
309+
#### **4. Cleanup**
310+
```bash
311+
python cleanup.py # Move temporary files to temp/ directory
312+
```
313+
314+
### **What These Tools Do:**
315+
316+
- **🔧 test_optimized_attention.py**: Monkey patches mlx-lm with your optimized attention and runs side-by-side performance comparison
317+
- **🚀 quick_demo.py**: Automated demo with multiple test prompts showing performance improvements
318+
- **🔍 verify_setup.py**: Checks dependencies, files, and setup before running tests
319+
- **🧹 cleanup.py**: Organizes temporary files created during testing
320+
321+
### **Expected Output:**
322+
323+
```
324+
🚀 PERFORMANCE COMPARISON:
325+
Speed Improvement: +9.8%
326+
Memory Change: -0.04 GB
327+
Time Improvement: +9.6%
328+
329+
🎯 SIGNIFICANT IMPROVEMENT achieved!
330+
```
331+
332+
See `TESTING_GUIDE.md` for detailed usage instructions.
333+
281334
## 📈 **Expected Evolution Trajectory**
282335

283336
### **Generation 1-10: Broadcasting Optimizations**

examples/mlx_metal_kernel_opt/config.yaml

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,18 @@
1-
# Qwen3-0.6B Custom GQA Attention Optimization Configuration
2-
# Target: Evolve custom GQA implementation using MLX primitives
3-
# Baseline: 70.3 tokens/sec average decode speed
4-
# Goal: 80+ tokens/sec through custom kernel evolution
5-
6-
max_iterations: 30
7-
checkpoint_interval: 5
1+
max_iterations: 50
2+
checkpoint_interval: 10
83
log_level: "INFO"
94

105
# LLM configuration - proven models for kernel optimization
116
llm:
127
primary_model: "gemini-2.5-flash-preview-05-20"
13-
primary_model_weight: 0.7
8+
primary_model_weight: 0.6
149
secondary_model: "gemini-2.5-pro-preview-06-05"
15-
secondary_model_weight: 0.3
10+
secondary_model_weight: 0.4
1611
api_base: "https://generativelanguage.googleapis.com/v1beta/openai/"
17-
temperature: 0.7
18-
top_p: 0.9
12+
temperature: 0.8
13+
top_p: 0.95
1914
max_tokens: 32000
20-
timeout: 300
15+
timeout: 600
2116

2217
# Focused prompt for custom GQA kernel evolution
2318
prompt:
@@ -144,16 +139,16 @@ prompt:
144139
# Database configuration
145140
database:
146141
db_path: "./openevolve_output/qwen3_custom_gqa"
147-
population_size: 25
148-
archive_size: 12
149-
num_islands: 2
142+
population_size: 50
143+
archive_size: 20
144+
num_islands: 4
150145
elite_selection_ratio: 0.25
151146
exploitation_ratio: 0.7
152147
exploration_ratio: 0.3
153148

154149
# Evaluator configuration
155150
evaluator:
156-
timeout: 300 # 5 minutes per evaluation
151+
timeout: 600 # 5 minutes per evaluation
157152
parallel_evaluations: 1
158153

159154
# Evolution settings

0 commit comments

Comments
 (0)