Skip to content

Latest commit

 

History

History
225 lines (166 loc) · 6.11 KB

File metadata and controls

225 lines (166 loc) · 6.11 KB

Performance Improvements Documentation

Overview

This document summarizes the performance optimizations made to the OpenPIV Python library to improve execution speed and reduce memory usage.

Summary of Changes

1. pyprocess.py Optimizations

find_all_first_peaks() - Line 335-340

Before:

index_list = [(i, v[0], v[1]) for i, v in enumerate(peaks)]
return np.array(index_list), np.array(peaks_max)

After:

n = peaks.shape[0]
index_list = np.column_stack((np.arange(n), peaks))
return index_list, peaks_max

Impact: Eliminates Python list comprehension and array conversion overhead. Fully vectorized using NumPy operations.


normalize_intensity() - Lines 752-776

Before:

window = window.astype(np.float32)  # Always converts

After:

if window.dtype != np.float32:
    window = window.astype(np.float32)
else:
    window = window.copy()  # Still need a copy to avoid modifying input

Impact: Avoids unnecessary dtype conversion when input is already float32, reducing memory allocation and copy operations.


find_all_second_peaks() - Lines 368-375

Before:

iini = x - width
ifin = x + width + 1
jini = y - width
jfin = y + width + 1
iini[iini < 0] = 0  # border checking
ifin[ifin > corr.shape[1]] = corr.shape[1]
jini[jini < 0] = 0
jfin[jfin > corr.shape[2]] = corr.shape[2]

After:

iini = np.maximum(x - width, 0)
ifin = np.minimum(x + width + 1, corr.shape[1])
jini = np.maximum(y - width, 0)
jfin = np.minimum(y + width + 1, corr.shape[2])

Impact: Uses vectorized NumPy maximum/minimum operations instead of array indexing, reducing operations and improving clarity.


2. validation.py Optimizations

global_std() - Lines 115-116

Before:

tmpu = np.ma.copy(u).filled(np.nan)
tmpv = np.ma.copy(v).filled(np.nan)

After:

if np.ma.is_masked(u):
    tmpu = np.where(u.mask, np.nan, u.data)
    tmpv = np.where(v.mask, np.nan, v.data)
else:
    tmpu = u
    tmpv = v

Impact: Eliminates unnecessary array copies and uses direct np.where operation. For non-masked arrays, avoids any copying.


local_median_val() - Lines 229-234

Before:

if np.ma.is_masked(u):
    masked_u = np.where(~u.mask, u.data, np.nan)
    masked_v = np.where(~v.mask, v.data, np.nan)

After:

if np.ma.is_masked(u):
    masked_u = np.where(u.mask, np.nan, u.data)
    masked_v = np.where(v.mask, np.nan, v.data)

Impact: Simplified logic by inverting condition, slightly more readable and efficient (avoids NOT operation).


local_norm_median_val() - Lines 303-308

Same optimization as local_median_val() - Consistent pattern across validation functions.


3. filters.py Optimizations

replace_outliers() - Lines 177-181

Before:

if not isinstance(u, np.ma.MaskedArray):
    u = np.ma.masked_array(u, mask=np.ma.nomask)
    
# store grid_mask for reinforcement
grid_mask = u.mask.copy()

After:

# Only create masked array if needed
if isinstance(u, np.ma.MaskedArray):
    grid_mask = u.mask.copy()
else:
    u = np.ma.masked_array(u, mask=np.ma.nomask)
    grid_mask = np.ma.nomask

Impact: Avoids creating masked arrays when input is already a regular array, reducing memory allocation and copy operations.


Performance Metrics

The following performance tests have been added to verify the improvements:

Test Results

  1. find_all_first_peaks_performance: < 10ms for 100 windows
  2. normalize_intensity_performance: < 50ms for 50 64x64 windows
  3. global_std_performance: < 10ms for 100x100 arrays
  4. replace_outliers_performance: < 100ms for 50x50 arrays with 3 iterations
  5. vectorized_sig2noise_ratio_performance: < 50ms for 200 windows

All performance tests consistently pass, ensuring the optimizations maintain correctness while improving speed.


General Optimization Principles Applied

  1. Avoid Unnecessary Copies: Check if data is already in the required format before copying
  2. Use Vectorized Operations: Replace Python loops and list comprehensions with NumPy operations
  3. Minimize Type Conversions: Only convert dtypes when necessary
  4. Direct Array Access: Use np.where and direct indexing instead of masked array copy operations
  5. Conditional Array Creation: Only create complex data structures when needed

Testing

All existing tests continue to pass:

  • 198 tests passed
  • 12 tests skipped
  • Total test suite runtime: ~8.5 seconds

New performance tests added:

  • 5 performance validation tests
  • Runtime: ~0.4 seconds

Impact on Real-World Usage

These optimizations particularly benefit:

  • Large PIV analysis jobs with many interrogation windows
  • Iterative refinement algorithms that call these functions repeatedly
  • Processing of high-resolution image pairs
  • Batch processing workflows

The improvements are most significant when:

  • Processing hundreds or thousands of interrogation windows
  • Using masked arrays for complex geometries
  • Running validation and filtering on large velocity fields
  • Using extended search area PIV with normalized correlation

Backward Compatibility

All changes maintain full backward compatibility:

  • Function signatures unchanged
  • Return types unchanged
  • Numerical results unchanged (verified by test suite)
  • Only internal implementation optimized

Future Optimization Opportunities

Additional areas that could be optimized in future work:

  1. correlation_to_displacement() (pyprocess.py, lines 1110-1122): Nested loops for processing correlations could be vectorized
  2. sig2noise_ratio() (pyprocess.py, lines 517-589): Already has vectorized version but could be made default
  3. lib.replace_nans(): Complex nested loop algorithm, difficult to vectorize but potential for Numba/Cython optimization
  4. Consider using Numba JIT compilation for hot paths
  5. Investigate GPU acceleration for FFT operations

References