Skip to content

feat: implement WASM SIMD optimizations for PNG filter operations#1

Draft
superstructor wants to merge 1 commit intowasmfrom
claude/wasm-simd-png-filters-01PrUW2WZkTcp2V4JGtamKLy
Draft

feat: implement WASM SIMD optimizations for PNG filter operations#1
superstructor wants to merge 1 commit intowasmfrom
claude/wasm-simd-png-filters-01PrUW2WZkTcp2V4JGtamKLy

Conversation

@superstructor
Copy link

This commit adds comprehensive WASM SIMD support for PNG filter operations, targeting 4x overall decoding speedup for typical PNG images.

Performance Targets

  • Sub filter: ≥4x speedup for RGBA images
  • Up filter: ≥5x speedup (simplest filter)
  • Average filter: ≥3x speedup
  • Overall PNG decode: ≥3x speedup for typical images

Changes

New Files

  • wasm/png_wasm_filter_init.c: Integration layer that hooks SIMD filter functions into libpng's PNG_FILTER_OPTIMIZATIONS dispatch system
  • bench/png-simd-bench.ts: Comprehensive benchmark suite for validating SIMD performance across different image sizes and filter types
  • SIMD_IMPLEMENTATION.md: Complete documentation of the SIMD implementation, including technical details, usage instructions, and performance notes

Modified Files

  • wasm/filter_wasm_simd.c: Fixed critical bugs in existing SIMD implementations:

    • Sub filter: Simplified to rely on loading updated values from previous iteration
    • Average filter: Fixed to use floor division instead of rounding up (wasm_u8x16_avgr rounds up, but PNG spec requires floor division)
    • Added proper comments explaining dependency handling
  • meson.build: Updated build configuration to enable SIMD compilation:

    • Added wasm_simd_sources file list
    • Conditional compilation based on 'simd' option
    • Added -DPNG_WASM_SIMD_OPT=1 and PNG_FILTER_OPTIMIZATIONS macro
    • Added -msimd128 compiler and linker flags
    • Updated all three build targets (main, side, static library)
  • deno.json: Updated build tasks to enable SIMD by default:

    • Added -Dsimd=true to build:main and build:side tasks
    • Added bench:simd task for running SIMD performance benchmarks

Technical Implementation

Up Filter (png_read_filter_row_up_wasm_simd)

Fully vectorized - processes 16 bytes per iteration with no dependencies. Expected 5-6x speedup over scalar implementation.

Sub Filter (png_read_filter_row_sub{3,4,6,8}_wasm_simd) Vectorized with dependency chain handling. Each iteration loads updated values from previous iteration to maintain correctness. Expected 4x speedup for RGBA (bpp=4).

Average Filter (png_read_filter_row_avg{3,4,6,8}_wasm_simd) Vectorized with corrected floor division. Uses XOR trick to convert wasm_u8x16_avgr's rounding up to proper floor division required by PNG spec. Expected 3x speedup.

Paeth Filter (png_read_filter_row_paeth{3,4,6,8}_wasm_simd) Currently uses scalar implementation due to complex predictor logic. Future optimization opportunity using SIMD min/max operations.

Building

Enable SIMD:

deno task build:wasm  # Builds with -Dsimd=true by default

Disable SIMD (scalar fallback):

meson setup build-main --cross-file=scripts/emscripten.cross -Dsimd=false

Testing

Run standard tests:

deno task test

Run SIMD benchmarks:

deno task bench:simd

Browser Compatibility

WASM SIMD supported in Chrome 91+, Edge 91+, Firefox 89+, Safari 16.4+. Automatic fallback to scalar implementation for older browsers.

Closes: Performance optimization for PNG decoding

This commit adds comprehensive WASM SIMD support for PNG filter operations,
targeting 4x overall decoding speedup for typical PNG images.

## Performance Targets
- Sub filter: ≥4x speedup for RGBA images
- Up filter: ≥5x speedup (simplest filter)
- Average filter: ≥3x speedup
- Overall PNG decode: ≥3x speedup for typical images

## Changes

### New Files
- **wasm/png_wasm_filter_init.c**: Integration layer that hooks SIMD filter
  functions into libpng's PNG_FILTER_OPTIMIZATIONS dispatch system
- **bench/png-simd-bench.ts**: Comprehensive benchmark suite for validating
  SIMD performance across different image sizes and filter types
- **SIMD_IMPLEMENTATION.md**: Complete documentation of the SIMD implementation,
  including technical details, usage instructions, and performance notes

### Modified Files
- **wasm/filter_wasm_simd.c**: Fixed critical bugs in existing SIMD implementations:
  - Sub filter: Simplified to rely on loading updated values from previous iteration
  - Average filter: Fixed to use floor division instead of rounding up
    (wasm_u8x16_avgr rounds up, but PNG spec requires floor division)
  - Added proper comments explaining dependency handling

- **meson.build**: Updated build configuration to enable SIMD compilation:
  - Added wasm_simd_sources file list
  - Conditional compilation based on 'simd' option
  - Added -DPNG_WASM_SIMD_OPT=1 and PNG_FILTER_OPTIMIZATIONS macro
  - Added -msimd128 compiler and linker flags
  - Updated all three build targets (main, side, static library)

- **deno.json**: Updated build tasks to enable SIMD by default:
  - Added -Dsimd=true to build:main and build:side tasks
  - Added bench:simd task for running SIMD performance benchmarks

## Technical Implementation

### Up Filter (png_read_filter_row_up_wasm_simd)
Fully vectorized - processes 16 bytes per iteration with no dependencies.
Expected 5-6x speedup over scalar implementation.

### Sub Filter (png_read_filter_row_sub{3,4,6,8}_wasm_simd)
Vectorized with dependency chain handling. Each iteration loads updated
values from previous iteration to maintain correctness. Expected 4x speedup
for RGBA (bpp=4).

### Average Filter (png_read_filter_row_avg{3,4,6,8}_wasm_simd)
Vectorized with corrected floor division. Uses XOR trick to convert
wasm_u8x16_avgr's rounding up to proper floor division required by PNG spec.
Expected 3x speedup.

### Paeth Filter (png_read_filter_row_paeth{3,4,6,8}_wasm_simd)
Currently uses scalar implementation due to complex predictor logic.
Future optimization opportunity using SIMD min/max operations.

## Building

Enable SIMD:
```bash
deno task build:wasm  # Builds with -Dsimd=true by default
```

Disable SIMD (scalar fallback):
```bash
meson setup build-main --cross-file=scripts/emscripten.cross -Dsimd=false
```

## Testing

Run standard tests:
```bash
deno task test
```

Run SIMD benchmarks:
```bash
deno task bench:simd
```

## Browser Compatibility

WASM SIMD supported in Chrome 91+, Edge 91+, Firefox 89+, Safari 16.4+.
Automatic fallback to scalar implementation for older browsers.

Closes: Performance optimization for PNG decoding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants