Skip to content

Commit 77c2f26

Browse files
committed
Refine AlgoTune configs and optimization hints
Updated LLM model weights to include 'google/gemini-2.5-pro' with adjusted weights across all tasks. Reduced verbosity and streamlined optimization hints in config prompts for clarity. Adjusted parallel_evaluations to 4 for most tasks (except polynomial_real, set to 1 to avoid JAX conflicts) and increased evaluator timeout for polynomial_real. Updated initial_program.py files to clarify and reorganize optimization opportunities.
1 parent 2af2b34 commit 77c2f26

File tree

11 files changed

+55
-340
lines changed

11 files changed

+55
-340
lines changed

examples/algotune/affine_transform_2d/config.yaml

Lines changed: 5 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ llm:
1414
api_base: "https://openrouter.ai/api/v1"
1515
models:
1616
- name: "google/gemini-2.5-flash"
17-
weight: 1.0
17+
weight: 0.8
18+
- name: "google/gemini-2.5-pro"
19+
weight: 0.2
1820

1921
temperature: 0.4 # Optimal (better than 0.2, 0.6, 0.8)
2022
max_tokens: 128000 # Increased from 16000 for much richer context
@@ -68,16 +70,10 @@ prompt:
6870
6971
Focus on improving the solve method to correctly handle the input format and produce valid solutions efficiently. Your solution will be compared against the reference AlgoTune baseline implementation to measure speedup and correctness.
7072
71-
72-
73-
74-
7573
PERFORMANCE OPTIMIZATION OPPORTUNITIES:
7674
You have access to high-performance libraries that can provide significant speedups:
7775
7876
• **JAX** - JIT compilation for numerical computations
79-
Key insight: Functions should be defined outside classes for JIT compatibility
80-
For jnp.roots(), consider using strip_zeros=False in JIT contexts
8177
8278
• **Numba** - Alternative JIT compilation, often simpler to use
8379
@@ -86,59 +82,7 @@ prompt:
8682
8783
• **Vectorization** - Look for opportunities to replace loops with array operations
8884
89-
EXPLORATION STRATEGY:
90-
1. Profile to identify bottlenecks first
91-
2. Consider multiple optimization approaches for the same problem
92-
3. Try both library-specific optimizations and algorithmic improvements
93-
4. Test different numerical libraries to find the best fit
94-
95-
96-
PROBLEM-SPECIFIC OPTIMIZATION HINTS:
97-
2D affine transformations - PROVEN OPTIMIZATIONS (2.3x speedup achieved):
98-
99-
**INTERPOLATION ORDER REDUCTION** (Most Effective - 30-40% speedup):
100-
• Use order=1 (linear) instead of order=3 (cubic) for scipy.ndimage.affine_transform
101-
• Linear interpolation is often sufficient for most transformations
102-
• Code: scipy.ndimage.affine_transform(image, matrix, order=1, mode="constant")
103-
• The accuracy loss is minimal for most image transformations
104-
105-
**PRECISION OPTIMIZATION** (20-30% speedup):
106-
• Convert images to float32 instead of float64
107-
• Code: image_float32 = image.astype(np.float32)
108-
• This leverages faster SIMD operations and reduces memory bandwidth
109-
• Combine with order=1 for maximum benefit
110-
111-
**APPLE SILICON M4 OPTIMIZATIONS** (5-10% additional speedup):
112-
• Use C-contiguous arrays for image processing
113-
• Code: image = np.ascontiguousarray(image.astype(np.float32))
114-
• Detect with: platform.processor() == 'arm' and platform.system() == 'Darwin'
115-
• Apple's Accelerate framework optimizes spline interpolation for these layouts
116-
117-
**COMPLETE OPTIMIZED EXAMPLE**:
118-
```python
119-
import platform
120-
IS_APPLE_SILICON = (platform.processor() == 'arm' and platform.system() == 'Darwin')
121-
122-
# Convert to float32 for speed
123-
image_float32 = image.astype(np.float32)
124-
matrix_float32 = matrix.astype(np.float32)
125-
126-
if IS_APPLE_SILICON:
127-
image_float32 = np.ascontiguousarray(image_float32)
128-
matrix_float32 = np.ascontiguousarray(matrix_float32)
129-
130-
# Use order=1 (linear) instead of order=3 (cubic)
131-
transformed = scipy.ndimage.affine_transform(
132-
image_float32, matrix_float32, order=1, mode="constant"
133-
)
134-
```
135-
136-
**AVOID**:
137-
• Complex JIT compilation (JAX/Numba) - overhead exceeds benefits for this task
138-
• OpenCV - adds dependency without consistent performance gain
139-
• Order=3 (cubic) interpolation unless accuracy is critical
140-
141-
num_top_programs: 10 # Increased from 3-5 for richer learning context
85+
num_top_programs: 5 # Increased from 3-5 for richer learning context
14286
num_diverse_programs: 5 # Increased from 2 for more diverse exploration
14387
include_artifacts: true # +20.7% improvement
14488

@@ -170,7 +114,7 @@ evaluator:
170114
cascade_thresholds: [0.5, 0.8]
171115

172116
# Parallel evaluations
173-
parallel_evaluations: 1
117+
parallel_evaluations: 4
174118

175119
# AlgoTune task-specific configuration
176120
algotune:

examples/algotune/affine_transform_2d/initial_program.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,15 @@
3939
4040
OPTIMIZATION OPPORTUNITIES:
4141
Consider these algorithmic improvements for significant performance gains:
42+
- Lower-order interpolation: Try order=0 (nearest) or order=1 (linear) vs default order=3 (cubic)
43+
Linear interpolation (order=1) often provides best speed/quality balance with major speedups
44+
- Precision optimization: float32 often sufficient vs float64, especially with lower interpolation orders
4245
- Separable transforms: Check if the transformation can be decomposed into separate x and y operations
4346
- Cache-friendly memory access patterns: Process data in blocks to improve cache utilization
44-
- Pre-computed interpolation coefficients: For repeated similar transformations
45-
- Direct coordinate mapping: Avoid intermediate coordinate calculations for simple transforms
4647
- JIT compilation: Use JAX or Numba for numerical operations that are Python-bottlenecked
47-
- Batch processing: Process multiple images or regions simultaneously for amortized overhead
48-
- Alternative interpolation methods: Lower-order interpolation for speed vs quality tradeoffs
48+
- Direct coordinate mapping: Avoid intermediate coordinate calculations for simple transforms
4949
- Hardware optimizations: Leverage SIMD instructions through vectorized operations
50+
- Batch processing: Process multiple images or regions simultaneously for amortized overhead
5051
5152
This is the initial implementation that will be evolved by OpenEvolve.
5253
The solve method will be improved through evolution.

examples/algotune/convolve2d_full_fill/config.yaml

Lines changed: 5 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ llm:
1414
api_base: "https://openrouter.ai/api/v1"
1515
models:
1616
- name: "google/gemini-2.5-flash"
17-
weight: 1.0
17+
weight: 0.8
18+
- name: "google/gemini-2.5-pro"
19+
weight: 0.2
1820

1921
temperature: 0.4 # Optimal (better than 0.2, 0.6, 0.8)
2022
max_tokens: 128000 # Increased from 16000 for much richer context
@@ -70,17 +72,11 @@ prompt:
7072
The output is a 2D array representing the convolution result.
7173
7274
Focus on improving the solve method to correctly handle the input format and produce valid solutions efficiently. Your solution will be compared against the reference AlgoTune baseline implementation to measure speedup and correctness.
73-
74-
75-
76-
7775
7876
PERFORMANCE OPTIMIZATION OPPORTUNITIES:
7977
You have access to high-performance libraries that can provide significant speedups:
8078
8179
• **JAX** - JIT compilation for numerical computations
82-
Key insight: Functions should be defined outside classes for JIT compatibility
83-
For jnp.roots(), consider using strip_zeros=False in JIT contexts
8480
8581
• **Numba** - Alternative JIT compilation, often simpler to use
8682
@@ -89,21 +85,7 @@ prompt:
8985
9086
• **Vectorization** - Look for opportunities to replace loops with array operations
9187
92-
EXPLORATION STRATEGY:
93-
1. Profile to identify bottlenecks first
94-
2. Consider multiple optimization approaches for the same problem
95-
3. Try both library-specific optimizations and algorithmic improvements
96-
4. Test different numerical libraries to find the best fit
97-
98-
99-
PROBLEM-SPECIFIC OPTIMIZATION HINTS:
100-
This task involves 2D convolution in 'full' mode - consider:
101-
• FFT-based convolution algorithms (O(n log n) vs O(n²))
102-
• scipy.signal functions may have optimized implementations
103-
• JAX also has FFT operations if JIT compilation benefits outweigh library optimizations
104-
• Memory layout and padding strategies can impact performance
105-
106-
num_top_programs: 10 # Increased from 3-5 for richer learning context
88+
num_top_programs: 5 # Increased from 3-5 for richer learning context
10789
num_diverse_programs: 5 # Increased from 2 for more diverse exploration
10890
include_artifacts: true # +20.7% improvement
10991

@@ -135,7 +117,7 @@ evaluator:
135117
cascade_thresholds: [0.5, 0.8]
136118

137119
# Parallel evaluations
138-
parallel_evaluations: 1
120+
parallel_evaluations: 4
139121

140122
# AlgoTune task-specific configuration
141123
algotune:

examples/algotune/convolve2d_full_fill/initial_program.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
3838
OPTIMIZATION OPPORTUNITIES:
3939
Consider these algorithmic improvements for massive performance gains:
40-
- FFT-based convolution: Use scipy.signal.fftconvolve for O(N²log N) complexity vs O(N⁴) direct convolution
40+
- Alternative convolution algorithms: Consider different approaches with varying computational complexity
4141
- Overlap-add/overlap-save methods: For extremely large inputs that don't fit in memory
4242
- Separable kernels: If the kernel can be decomposed into 1D convolutions (rank-1 factorization)
4343
- Winograd convolution: For small kernels (3x3, 5x5) with fewer multiplications

examples/algotune/eigenvectors_complex/config.yaml

Lines changed: 5 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ llm:
1414
api_base: "https://openrouter.ai/api/v1"
1515
models:
1616
- name: "google/gemini-2.5-flash"
17-
weight: 1.0
17+
weight: 0.8
18+
- name: "google/gemini-2.5-pro"
19+
weight: 0.2
1820

1921
temperature: 0.4 # Optimal (better than 0.2, 0.6, 0.8)
2022
max_tokens: 128000 # Increased from 16000 for much richer context
@@ -76,17 +78,11 @@ prompt:
7678
- eigenvectors is an array of n eigenvectors, each of length n, representing the eigenvector corresponding to the eigenvalue at the same index.
7779
7880
Focus on improving the solve method to correctly handle the input format and produce valid solutions efficiently. Your solution will be compared against the reference AlgoTune baseline implementation to measure speedup and correctness.
79-
80-
81-
82-
8381
8482
PERFORMANCE OPTIMIZATION OPPORTUNITIES:
8583
You have access to high-performance libraries that can provide significant speedups:
8684
8785
• **JAX** - JIT compilation for numerical computations
88-
Key insight: Functions should be defined outside classes for JIT compatibility
89-
For jnp.roots(), consider using strip_zeros=False in JIT contexts
9086
9187
• **Numba** - Alternative JIT compilation, often simpler to use
9288
@@ -95,60 +91,7 @@ prompt:
9591
9692
• **Vectorization** - Look for opportunities to replace loops with array operations
9793
98-
EXPLORATION STRATEGY:
99-
1. Profile to identify bottlenecks first
100-
2. Consider multiple optimization approaches for the same problem
101-
3. Try both library-specific optimizations and algorithmic improvements
102-
4. Test different numerical libraries to find the best fit
103-
104-
105-
PROBLEM-SPECIFIC OPTIMIZATION HINTS:
106-
Computing eigenvectors of complex matrices - PROVEN OPTIMIZATIONS (1.4x speedup achieved):
107-
108-
**KEY INSIGHT**: The input matrix is REAL (not complex), but the original algorithm treats it as complex.
109-
Post-processing (sorting/normalization) can be heavily optimized.
110-
111-
**VECTORIZED POST-PROCESSING** (Most Effective - 35% speedup):
112-
• Use numpy.argsort instead of Python's sort for eigenvalue ordering
113-
• Vectorize normalization using broadcasting instead of loops
114-
• Use advanced indexing to avoid memory copies
115-
116-
**OPTIMIZED IMPLEMENTATION**:
117-
```python
118-
# Use numpy.linalg.eig (faster than scipy for small/medium matrices)
119-
eigenvalues, eigenvectors = np.linalg.eig(A)
120-
121-
# VECTORIZED SORTING: Use numpy.lexsort (much faster than Python sort)
122-
sort_indices = np.lexsort((-eigenvalues.imag, -eigenvalues.real))
123-
sorted_eigenvectors = eigenvectors[:, sort_indices] # No copying
124-
125-
# VECTORIZED NORMALIZATION: All columns at once
126-
norms = np.linalg.norm(sorted_eigenvectors, axis=0)
127-
valid_mask = norms > 1e-12
128-
sorted_eigenvectors[:, valid_mask] /= norms[valid_mask]
129-
130-
# EFFICIENT CONVERSION: Use .T.tolist() instead of Python loops
131-
return sorted_eigenvectors.T.tolist()
132-
```
133-
134-
**MEMORY LAYOUT OPTIMIZATION** (5-10% additional on M4):
135-
• Use C-contiguous arrays for numpy.linalg.eig
136-
• Code: A = np.ascontiguousarray(A.astype(np.float64))
137-
• Detect Apple Silicon: platform.processor() == 'arm' and platform.system() == 'Darwin'
138-
139-
**KEY OPTIMIZATIONS**:
140-
• Replace Python loops with numpy vectorized operations
141-
• Eliminate list() and zip() operations in sorting
142-
• Use advanced indexing instead of creating copies
143-
• Stay in numpy throughout, convert to list only at the end
144-
145-
**AVOID**:
146-
• Python sorting with lambda functions - extremely slow
147-
• eigenvectors.T - creates unnecessary matrix copy
148-
• Loop-based normalization - vectorize instead
149-
• scipy.linalg.eig for small matrices - has more overhead than numpy
150-
151-
num_top_programs: 10 # Increased from 3-5 for richer learning context
94+
num_top_programs: 5 # Increased from 3-5 for richer learning context
15295
num_diverse_programs: 5 # Increased from 2 for more diverse exploration
15396
include_artifacts: true # +20.7% improvement
15497

@@ -180,7 +123,7 @@ evaluator:
180123
cascade_thresholds: [0.5, 0.8]
181124

182125
# Parallel evaluations
183-
parallel_evaluations: 1
126+
parallel_evaluations: 4
184127

185128
# AlgoTune task-specific configuration
186129
algotune:

examples/algotune/fft_cmplx_scipy_fftpack/config.yaml

Lines changed: 6 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ llm:
1414
api_base: "https://openrouter.ai/api/v1"
1515
models:
1616
- name: "google/gemini-2.5-flash"
17-
weight: 1.0
17+
weight: 0.8
18+
- name: "google/gemini-2.5-pro"
19+
weight: 0.2
1820

1921
temperature: 0.4 # Optimal (better than 0.2, 0.6, 0.8)
2022
max_tokens: 128000 # Increased from 16000 for much richer context
@@ -66,22 +68,16 @@ prompt:
6668
6769
This task requires computing the N-dimensional Fast Fourier Transform (FFT) of a complex-valued matrix.
6870
The FFT is a mathematical technique that converts data from the spatial (or time) domain into the frequency domain, revealing both the magnitude and phase of the frequency components.
69-
The input is a square matrix of size n×n, where each element is a complex number containing both real and imaginary parts.
71+
The input is a square matrix of size nxn, where each element is a complex number containing both real and imaginary parts.
7072
The output is a square matrix of the same size, where each entry is a complex number representing a specific frequency component of the input data, including its amplitude and phase.
7173
This transformation is crucial in analyzing signals and data with inherent complex properties.
7274
7375
Focus on improving the solve method to correctly handle the input format and produce valid solutions efficiently. Your solution will be compared against the reference AlgoTune baseline implementation to measure speedup and correctness.
74-
75-
76-
77-
7876
7977
PERFORMANCE OPTIMIZATION OPPORTUNITIES:
8078
You have access to high-performance libraries that can provide significant speedups:
8179
8280
• **JAX** - JIT compilation for numerical computations
83-
Key insight: Functions should be defined outside classes for JIT compatibility
84-
For jnp.roots(), consider using strip_zeros=False in JIT contexts
8581
8682
• **Numba** - Alternative JIT compilation, often simpler to use
8783
@@ -90,61 +86,7 @@ prompt:
9086
9187
• **Vectorization** - Look for opportunities to replace loops with array operations
9288
93-
EXPLORATION STRATEGY:
94-
1. Profile to identify bottlenecks first
95-
2. Consider multiple optimization approaches for the same problem
96-
3. Try both library-specific optimizations and algorithmic improvements
97-
4. Test different numerical libraries to find the best fit
98-
99-
100-
PROBLEM-SPECIFIC OPTIMIZATION HINTS:
101-
Complex 2D FFT operations - PROVEN OPTIMIZATIONS (1.2x speedup achieved):
102-
103-
**COMPLEX PRECISION REDUCTION** (Most Effective - 10-20% speedup):
104-
• Use complex64 instead of complex128 for FFT computation
105-
• Code: problem_64 = problem_array.astype(np.complex64)
106-
• Then: result = scipy.fftpack.fftn(problem_64)
107-
• Convert back to complex128 after computation for compatibility
108-
• This reduces memory bandwidth and leverages faster SIMD operations
109-
110-
**MEMORY LAYOUT OPTIMIZATION FOR M4** (5-10% additional speedup):
111-
• Use Fortran-ordered arrays for optimal FFTPACK performance
112-
• Code: problem_opt = np.asfortranarray(problem.astype(np.complex64))
113-
• Detect Apple Silicon: platform.processor() == 'arm' and platform.system() == 'Darwin'
114-
• FFTPACK internally uses Fortran routines that benefit from this layout
115-
116-
**COMPLETE OPTIMIZED EXAMPLE**:
117-
```python
118-
import platform
119-
import scipy.fftpack as fftpack
120-
121-
IS_APPLE_SILICON = (platform.processor() == 'arm' and platform.system() == 'Darwin')
122-
123-
# Convert to complex64 for speed
124-
problem_64 = np.array(problem, dtype=np.complex64)
125-
126-
if IS_APPLE_SILICON:
127-
# Fortran layout for optimal FFTPACK performance
128-
problem_64 = np.asfortranarray(problem_64)
129-
130-
# Perform FFT with reduced precision
131-
result_64 = fftpack.fftn(problem_64)
132-
133-
# Convert back to complex128 for precision/compatibility
134-
result = result_64.astype(np.complex128)
135-
```
136-
137-
**IMPORTANT NOTES**:
138-
• scipy.fftpack.fftn is already highly optimized - focus on precision/layout
139-
• numpy.fft.fftn is typically slower than scipy.fftpack for this task
140-
• The tolerance in is_solution allows for complex64 precision (1e-5)
141-
142-
**AVOID**:
143-
• JAX/Numba JIT - process overhead exceeds FFT benefits
144-
• numpy.fft instead of scipy.fftpack - consistently slower
145-
• Complex128 throughout - unnecessary precision for most FFT applications
146-
147-
num_top_programs: 10 # Increased from 3-5 for richer learning context
89+
num_top_programs: 5 # Increased from 3-5 for richer learning context
14890
num_diverse_programs: 5 # Increased from 2 for more diverse exploration
14991
include_artifacts: true # +20.7% improvement
15092

@@ -176,7 +118,7 @@ evaluator:
176118
cascade_thresholds: [0.5, 0.8]
177119

178120
# Parallel evaluations
179-
parallel_evaluations: 1
121+
parallel_evaluations: 4
180122

181123
# AlgoTune task-specific configuration
182124
algotune:

0 commit comments

Comments
 (0)