⚡️ Speed up function manual_convolution_1d by 4,370%
#204
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 4,370% (43.70x) speedup for
manual_convolution_1dinsrc/signal/filters.py⏱️ Runtime :
16.3 milliseconds→364 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 44x speedup by replacing nested Python loops with NumPy's vectorized operations, specifically using stride tricks and matrix multiplication.
Key Optimizations
1. Eliminated nested loops (59.6% of original runtime)
The original implementation used two nested
forloops that performed ~54,000 individual element multiplications and additions for typical test cases. Each iteration involved Python interpreter overhead for indexing and arithmetic operations.2. Vectorized computation via stride tricks
The optimization uses
np.lib.stride_tricks.as_strided()to create a 2D "sliding window" view of the signal without copying data. This transforms the convolution problem into a single matrix-vector multiplication:(result_len, kernel_len)shaped view where each row contains the signal values for one convolution stepnp.dot(strided, kernel)operation replaces all loop iterations3. Memory efficiency
The stride trick creates a memory view rather than copying data, avoiding additional allocations while maintaining the same memory footprint as the original.
Performance Characteristics
Based on the annotated tests:
The optimization includes a guard condition
if result_len <= 0to preserve behavior for edge cases where the kernel is longer than the signal.When This Matters
This optimization is most impactful when:
The trade-off is acceptable: small inputs see modest slowdown (still sub-10μs), while realistic workloads see dramatic speedups from milliseconds to microseconds.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-manual_convolution_1d-mjhz4ezeand push.