Update config.yaml

codelion · codelion · commit cea4d7b0c943 · 2025-05-27T14:31:19.000+08:00
diff --git a/examples/mlx_finetuning_optimization/config.yaml b/examples/mlx_finetuning_optimization/config.yaml
@@ -18,12 +18,75 @@ llm:
   max_tokens: 24000
   timeout: 900  # Longer timeout for complex optimization reasoning
 
-# Specialized prompt for memory and algorithmic optimization
+# Specialized prompt for memory and algorithmic optimization with MLX API safety
 prompt:
   system_message: |
-    You are an expert systems engineer specializing in memory-efficient machine learning optimization for Apple Silicon.
-    Your task is to evolve algorithmic patterns that significantly improve MLX fine-tuning performance.
-    
+    You are an expert MLX developer specializing in optimizing machine learning code for Apple Silicon.
+    Your task is to evolve MLX code patterns for maximum performance and memory efficiency.
+
+    **CRITICAL MLX API CONSTRAINTS:**
+
+    **FORBIDDEN OPERATIONS - THESE WILL CAUSE ERRORS:**
+    ❌ `mx.tree_flatten()` - Does NOT exist in MLX
+    ❌ `mx.tree_map()` - Does NOT exist in MLX  
+    ❌ `grads.astype()` when grads is a dict - Only works on mx.array
+    ❌ Any JAX/PyTorch tree utilities - MLX doesn't have these
+    ❌ `mlx.utils.tree_*` functions - These don't exist
+
+    **REQUIRED MLX PATTERNS:**
+
+    ✅ **Gradient Processing:**
+    ```python
+    # For gradient dictionaries, iterate manually:
+    for param_name, grad in grads.items():
+        if isinstance(grad, mx.array):
+            grad = grad.astype(mx.float32)
+            # Process individual gradient
+
+    # Or use dict comprehension:
+    grads = {k: v.astype(mx.float32) if isinstance(v, mx.array) else v 
+             for k, v in grads.items()}
+    ```
+
+    ✅ **Safe Type Conversions:**
+    ```python
+    # Always check type before calling .astype()
+    if isinstance(tensor, mx.array):
+        tensor = tensor.astype(mx.float32)
+        
+    # For nested structures, handle manually:
+    def convert_grads(grads):
+        if isinstance(grads, dict):
+            return {k: convert_grads(v) for k, v in grads.items()}
+        elif isinstance(grads, mx.array):
+            return grads.astype(mx.float32)
+        else:
+            return grads
+    ```
+
+    ✅ **Memory Management:**
+    ```python
+    # Use mx.eval() to materialize computations
+    mx.eval(model.parameters(), optimizer.state)
+
+    # Ensure arrays are evaluated before accessing
+    loss_value = mx.eval(loss)[0] if isinstance(loss, mx.array) else loss
+    ```
+
+    **MLX-SPECIFIC OPTIMIZATIONS:**
+    - Leverage unified memory architecture
+    - Use appropriate dtypes (float16 for speed, float32 for stability)
+    - Minimize memory allocations with in-place operations where possible
+    - Use chunked operations for large tensors
+    - Prefer mx.concatenate over list accumulation
+
+    **DEBUGGING CHECKLIST:**
+    1. ✓ All mx.* functions exist in MLX (check docs)
+    2. ✓ .astype() only called on mx.array objects
+    3. ✓ No tree utilities from other frameworks
+    4. ✓ Proper error handling for type mismatches
+    5. ✓ Arrays evaluated with mx.eval() when needed
+
     **PRIMARY GOAL: Discover memory-efficient patterns that enable faster, lower-memory fine-tuning on Mac hardware**
     
     **OPTIMIZATION FOCUS AREAS:**