From e2946629c34ca6ca5b5eef40a69eb7895f07018c Mon Sep 17 00:00:00 2001 From: "codeflash-ai[bot]" <148906541+codeflash-ai[bot]@users.noreply.github.com> Date: Tue, 23 Dec 2025 03:32:50 +0000 Subject: [PATCH] Optimize image_rotation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The optimized code achieves a **22x speedup** by replacing nested Python loops with **vectorized NumPy operations**. Here's why this transformation is so effective: ## Key Optimization: Vectorization **What changed:** - The original code uses nested `for` loops iterating over ~2.1 million pixels (for typical test images), performing scalar arithmetic and array indexing at each iteration - The optimized code uses `np.meshgrid()` to create coordinate grids, then performs all transformations as array operations in a single pass **Why it's faster:** 1. **Eliminates Python interpreter overhead**: The original code spends 70% of runtime in loop overhead and scalar operations (lines with `for`, `offset_y = y - new_center_y`, etc.). Vectorization moves computation to compiled C code in NumPy. 2. **SIMD and cache efficiency**: NumPy operations leverage CPU vectorization (SIMD instructions) and better cache locality by processing contiguous memory blocks, versus scattered memory access in nested loops. 3. **Reduces per-pixel overhead**: The line profiler shows the original code spends 785-833ns per pixel just computing `original_y` and `original_x`. The optimized version does all ~2M transformations in 13.8ms total (6.5ns per pixel) - a **120x improvement** for the transformation step alone. ## Performance Characteristics **Small images (< 10x10 pixels)**: The optimization is 60-86% **slower** due to NumPy array allocation overhead exceeding the benefit of vectorization. This is evident in tests like `test_single_pixel_image` (7μs → 36μs). **Large images (≥ 100x100 pixels)**: The optimization shines with **19-29x speedups**: - `test_large_square_image_rotation_90`: 7.38ms → 365μs (19x faster) - `test_large_rectangular_image_rotation_45`: 26.7ms → 905μs (28x faster) - `test_large_image_non_multiple_of_90`: 250ms → 9.33ms (26x faster) ## Impact Assessment **When this matters:** - Image processing pipelines handling images larger than ~50x50 pixels - Batch operations on multiple images - Real-time rotation requirements where the function is called frequently - Any scenario processing video frames or high-resolution images **Trade-offs:** - Increased memory usage (temporary arrays for coordinate grids, ~8x the output size during execution) - Slightly worse performance for tiny images (< 10x10), but these cases are rare in practice and the absolute difference is negligible (< 50μs) The optimization transforms this from an O(n²) Python loop bottleneck into an O(n²) vectorized operation that leverages hardware acceleration, making it suitable for production image processing workloads. --- src/signal/image.py | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/src/signal/image.py b/src/signal/image.py index 68f01a4..3b3a077 100644 --- a/src/signal/image.py +++ b/src/signal/image.py @@ -15,14 +15,23 @@ def image_rotation(image: np.ndarray, angle_degrees: float) -> np.ndarray: else (new_height, new_width) ) new_center_y, new_center_x = new_height // 2, new_width // 2 - for y in range(new_height): - for x in range(new_width): - offset_y = y - new_center_y - offset_x = x - new_center_x - original_y = int(offset_y * cos_theta - offset_x * sin_theta + center_y) - original_x = int(offset_y * sin_theta + offset_x * cos_theta + center_x) - if 0 <= original_y < height and 0 <= original_x < width: - rotated[y, x] = image[original_y, original_x] + y_idx = np.arange(new_height) + x_idx = np.arange(new_width) + y_grid, x_grid = np.meshgrid(y_idx, x_idx, indexing="ij") + offset_y = y_grid - new_center_y + offset_x = x_grid - new_center_x + original_y = (offset_y * cos_theta - offset_x * sin_theta + center_y).astype(int) + original_x = (offset_y * sin_theta + offset_x * cos_theta + center_x).astype(int) + valid_mask = ( + (original_y >= 0) + & (original_y < height) + & (original_x >= 0) + & (original_x < width) + ) + valid_positions = np.where(valid_mask) + rotated[valid_positions[0], valid_positions[1]] = image[ + original_y[valid_positions], original_x[valid_positions] + ] return rotated