From e2946629c34ca6ca5b5eef40a69eb7895f07018c Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Tue, 23 Dec 2025 03:32:50 +0000
Subject: [PATCH] Optimize image_rotation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimized code achieves a **22x speedup** by replacing nested Python loops with **vectorized NumPy operations**. Here's why this transformation is so effective:

## Key Optimization: Vectorization

**What changed:**
- The original code uses nested `for` loops iterating over ~2.1 million pixels (for typical test images), performing scalar arithmetic and array indexing at each iteration
- The optimized code uses `np.meshgrid()` to create coordinate grids, then performs all transformations as array operations in a single pass

**Why it's faster:**
1. **Eliminates Python interpreter overhead**: The original code spends 70% of runtime in loop overhead and scalar operations (lines with `for`, `offset_y = y - new_center_y`, etc.). Vectorization moves computation to compiled C code in NumPy.

2. **SIMD and cache efficiency**: NumPy operations leverage CPU vectorization (SIMD instructions) and better cache locality by processing contiguous memory blocks, versus scattered memory access in nested loops.

3. **Reduces per-pixel overhead**: The line profiler shows the original code spends 785-833ns per pixel just computing `original_y` and `original_x`. The optimized version does all ~2M transformations in 13.8ms total (6.5ns per pixel) - a **120x improvement** for the transformation step alone.

## Performance Characteristics

**Small images (< 10x10 pixels)**: The optimization is 60-86% **slower** due to NumPy array allocation overhead exceeding the benefit of vectorization. This is evident in tests like `test_single_pixel_image` (7μs → 36μs).

**Large images (≥ 100x100 pixels)**: The optimization shines with **19-29x speedups**:
- `test_large_square_image_rotation_90`: 7.38ms → 365μs (19x faster)
- `test_large_rectangular_image_rotation_45`: 26.7ms → 905μs (28x faster)
- `test_large_image_non_multiple_of_90`: 250ms → 9.33ms (26x faster)

## Impact Assessment

**When this matters:**
- Image processing pipelines handling images larger than ~50x50 pixels
- Batch operations on multiple images
- Real-time rotation requirements where the function is called frequently
- Any scenario processing video frames or high-resolution images

**Trade-offs:**
- Increased memory usage (temporary arrays for coordinate grids, ~8x the output size during execution)
- Slightly worse performance for tiny images (< 10x10), but these cases are rare in practice and the absolute difference is negligible (< 50μs)

The optimization transforms this from an O(n²) Python loop bottleneck into an O(n²) vectorized operation that leverages hardware acceleration, making it suitable for production image processing workloads.
---
 src/signal/image.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/src/signal/image.py b/src/signal/image.py
index 68f01a4..3b3a077 100644
--- a/src/signal/image.py
+++ b/src/signal/image.py
@@ -15,14 +15,23 @@ def image_rotation(image: np.ndarray, angle_degrees: float) -> np.ndarray:
         else (new_height, new_width)
     )
     new_center_y, new_center_x = new_height // 2, new_width // 2
-    for y in range(new_height):
-        for x in range(new_width):
-            offset_y = y - new_center_y
-            offset_x = x - new_center_x
-            original_y = int(offset_y * cos_theta - offset_x * sin_theta + center_y)
-            original_x = int(offset_y * sin_theta + offset_x * cos_theta + center_x)
-            if 0 <= original_y < height and 0 <= original_x < width:
-                rotated[y, x] = image[original_y, original_x]
+    y_idx = np.arange(new_height)
+    x_idx = np.arange(new_width)
+    y_grid, x_grid = np.meshgrid(y_idx, x_idx, indexing="ij")
+    offset_y = y_grid - new_center_y
+    offset_x = x_grid - new_center_x
+    original_y = (offset_y * cos_theta - offset_x * sin_theta + center_y).astype(int)
+    original_x = (offset_y * sin_theta + offset_x * cos_theta + center_x).astype(int)
+    valid_mask = (
+        (original_y >= 0)
+        & (original_y < height)
+        & (original_x >= 0)
+        & (original_x < width)
+    )
+    valid_positions = np.where(valid_mask)
+    rotated[valid_positions[0], valid_positions[1]] = image[
+        original_y[valid_positions], original_x[valid_positions]
+    ]
     return rotated