This document summarizes the performance optimizations made to the OpenPIV Python library to improve execution speed and reduce memory usage.
Before:
index_list = [(i, v[0], v[1]) for i, v in enumerate(peaks)]
return np.array(index_list), np.array(peaks_max)After:
n = peaks.shape[0]
index_list = np.column_stack((np.arange(n), peaks))
return index_list, peaks_maxImpact: Eliminates Python list comprehension and array conversion overhead. Fully vectorized using NumPy operations.
Before:
window = window.astype(np.float32) # Always convertsAfter:
if window.dtype != np.float32:
window = window.astype(np.float32)
else:
window = window.copy() # Still need a copy to avoid modifying inputImpact: Avoids unnecessary dtype conversion when input is already float32, reducing memory allocation and copy operations.
Before:
iini = x - width
ifin = x + width + 1
jini = y - width
jfin = y + width + 1
iini[iini < 0] = 0 # border checking
ifin[ifin > corr.shape[1]] = corr.shape[1]
jini[jini < 0] = 0
jfin[jfin > corr.shape[2]] = corr.shape[2]After:
iini = np.maximum(x - width, 0)
ifin = np.minimum(x + width + 1, corr.shape[1])
jini = np.maximum(y - width, 0)
jfin = np.minimum(y + width + 1, corr.shape[2])Impact: Uses vectorized NumPy maximum/minimum operations instead of array indexing, reducing operations and improving clarity.
Before:
tmpu = np.ma.copy(u).filled(np.nan)
tmpv = np.ma.copy(v).filled(np.nan)After:
if np.ma.is_masked(u):
tmpu = np.where(u.mask, np.nan, u.data)
tmpv = np.where(v.mask, np.nan, v.data)
else:
tmpu = u
tmpv = vImpact: Eliminates unnecessary array copies and uses direct np.where operation. For non-masked arrays, avoids any copying.
Before:
if np.ma.is_masked(u):
masked_u = np.where(~u.mask, u.data, np.nan)
masked_v = np.where(~v.mask, v.data, np.nan)After:
if np.ma.is_masked(u):
masked_u = np.where(u.mask, np.nan, u.data)
masked_v = np.where(v.mask, np.nan, v.data)Impact: Simplified logic by inverting condition, slightly more readable and efficient (avoids NOT operation).
Same optimization as local_median_val() - Consistent pattern across validation functions.
Before:
if not isinstance(u, np.ma.MaskedArray):
u = np.ma.masked_array(u, mask=np.ma.nomask)
# store grid_mask for reinforcement
grid_mask = u.mask.copy()After:
# Only create masked array if needed
if isinstance(u, np.ma.MaskedArray):
grid_mask = u.mask.copy()
else:
u = np.ma.masked_array(u, mask=np.ma.nomask)
grid_mask = np.ma.nomaskImpact: Avoids creating masked arrays when input is already a regular array, reducing memory allocation and copy operations.
The following performance tests have been added to verify the improvements:
- find_all_first_peaks_performance: < 10ms for 100 windows
- normalize_intensity_performance: < 50ms for 50 64x64 windows
- global_std_performance: < 10ms for 100x100 arrays
- replace_outliers_performance: < 100ms for 50x50 arrays with 3 iterations
- vectorized_sig2noise_ratio_performance: < 50ms for 200 windows
All performance tests consistently pass, ensuring the optimizations maintain correctness while improving speed.
- Avoid Unnecessary Copies: Check if data is already in the required format before copying
- Use Vectorized Operations: Replace Python loops and list comprehensions with NumPy operations
- Minimize Type Conversions: Only convert dtypes when necessary
- Direct Array Access: Use np.where and direct indexing instead of masked array copy operations
- Conditional Array Creation: Only create complex data structures when needed
All existing tests continue to pass:
- 198 tests passed
- 12 tests skipped
- Total test suite runtime: ~8.5 seconds
New performance tests added:
- 5 performance validation tests
- Runtime: ~0.4 seconds
These optimizations particularly benefit:
- Large PIV analysis jobs with many interrogation windows
- Iterative refinement algorithms that call these functions repeatedly
- Processing of high-resolution image pairs
- Batch processing workflows
The improvements are most significant when:
- Processing hundreds or thousands of interrogation windows
- Using masked arrays for complex geometries
- Running validation and filtering on large velocity fields
- Using extended search area PIV with normalized correlation
All changes maintain full backward compatibility:
- Function signatures unchanged
- Return types unchanged
- Numerical results unchanged (verified by test suite)
- Only internal implementation optimized
Additional areas that could be optimized in future work:
- correlation_to_displacement() (pyprocess.py, lines 1110-1122): Nested loops for processing correlations could be vectorized
- sig2noise_ratio() (pyprocess.py, lines 517-589): Already has vectorized version but could be made default
- lib.replace_nans(): Complex nested loop algorithm, difficult to vectorize but potential for Numba/Cython optimization
- Consider using Numba JIT compilation for hot paths
- Investigate GPU acceleration for FFT operations
- NumPy best practices: https://numpy.org/doc/stable/user/basics.performance.html
- Masked array documentation: https://numpy.org/doc/stable/reference/maskedarray.html