feat: implement WASM SIMD optimizations for PNG filter operations#1
Draft
superstructor wants to merge 1 commit intowasmfrom
Draft
feat: implement WASM SIMD optimizations for PNG filter operations#1superstructor wants to merge 1 commit intowasmfrom
superstructor wants to merge 1 commit intowasmfrom
Conversation
This commit adds comprehensive WASM SIMD support for PNG filter operations,
targeting 4x overall decoding speedup for typical PNG images.
## Performance Targets
- Sub filter: ≥4x speedup for RGBA images
- Up filter: ≥5x speedup (simplest filter)
- Average filter: ≥3x speedup
- Overall PNG decode: ≥3x speedup for typical images
## Changes
### New Files
- **wasm/png_wasm_filter_init.c**: Integration layer that hooks SIMD filter
functions into libpng's PNG_FILTER_OPTIMIZATIONS dispatch system
- **bench/png-simd-bench.ts**: Comprehensive benchmark suite for validating
SIMD performance across different image sizes and filter types
- **SIMD_IMPLEMENTATION.md**: Complete documentation of the SIMD implementation,
including technical details, usage instructions, and performance notes
### Modified Files
- **wasm/filter_wasm_simd.c**: Fixed critical bugs in existing SIMD implementations:
- Sub filter: Simplified to rely on loading updated values from previous iteration
- Average filter: Fixed to use floor division instead of rounding up
(wasm_u8x16_avgr rounds up, but PNG spec requires floor division)
- Added proper comments explaining dependency handling
- **meson.build**: Updated build configuration to enable SIMD compilation:
- Added wasm_simd_sources file list
- Conditional compilation based on 'simd' option
- Added -DPNG_WASM_SIMD_OPT=1 and PNG_FILTER_OPTIMIZATIONS macro
- Added -msimd128 compiler and linker flags
- Updated all three build targets (main, side, static library)
- **deno.json**: Updated build tasks to enable SIMD by default:
- Added -Dsimd=true to build:main and build:side tasks
- Added bench:simd task for running SIMD performance benchmarks
## Technical Implementation
### Up Filter (png_read_filter_row_up_wasm_simd)
Fully vectorized - processes 16 bytes per iteration with no dependencies.
Expected 5-6x speedup over scalar implementation.
### Sub Filter (png_read_filter_row_sub{3,4,6,8}_wasm_simd)
Vectorized with dependency chain handling. Each iteration loads updated
values from previous iteration to maintain correctness. Expected 4x speedup
for RGBA (bpp=4).
### Average Filter (png_read_filter_row_avg{3,4,6,8}_wasm_simd)
Vectorized with corrected floor division. Uses XOR trick to convert
wasm_u8x16_avgr's rounding up to proper floor division required by PNG spec.
Expected 3x speedup.
### Paeth Filter (png_read_filter_row_paeth{3,4,6,8}_wasm_simd)
Currently uses scalar implementation due to complex predictor logic.
Future optimization opportunity using SIMD min/max operations.
## Building
Enable SIMD:
```bash
deno task build:wasm # Builds with -Dsimd=true by default
```
Disable SIMD (scalar fallback):
```bash
meson setup build-main --cross-file=scripts/emscripten.cross -Dsimd=false
```
## Testing
Run standard tests:
```bash
deno task test
```
Run SIMD benchmarks:
```bash
deno task bench:simd
```
## Browser Compatibility
WASM SIMD supported in Chrome 91+, Edge 91+, Firefox 89+, Safari 16.4+.
Automatic fallback to scalar implementation for older browsers.
Closes: Performance optimization for PNG decoding
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds comprehensive WASM SIMD support for PNG filter operations, targeting 4x overall decoding speedup for typical PNG images.
Performance Targets
Changes
New Files
Modified Files
wasm/filter_wasm_simd.c: Fixed critical bugs in existing SIMD implementations:
meson.build: Updated build configuration to enable SIMD compilation:
deno.json: Updated build tasks to enable SIMD by default:
Technical Implementation
Up Filter (png_read_filter_row_up_wasm_simd)
Fully vectorized - processes 16 bytes per iteration with no dependencies. Expected 5-6x speedup over scalar implementation.
Sub Filter (png_read_filter_row_sub{3,4,6,8}_wasm_simd) Vectorized with dependency chain handling. Each iteration loads updated values from previous iteration to maintain correctness. Expected 4x speedup for RGBA (bpp=4).
Average Filter (png_read_filter_row_avg{3,4,6,8}_wasm_simd) Vectorized with corrected floor division. Uses XOR trick to convert wasm_u8x16_avgr's rounding up to proper floor division required by PNG spec. Expected 3x speedup.
Paeth Filter (png_read_filter_row_paeth{3,4,6,8}_wasm_simd) Currently uses scalar implementation due to complex predictor logic. Future optimization opportunity using SIMD min/max operations.
Building
Enable SIMD:
deno task build:wasm # Builds with -Dsimd=true by defaultDisable SIMD (scalar fallback):
Testing
Run standard tests:
deno task testRun SIMD benchmarks:
Browser Compatibility
WASM SIMD supported in Chrome 91+, Edge 91+, Firefox 89+, Safari 16.4+. Automatic fallback to scalar implementation for older browsers.
Closes: Performance optimization for PNG decoding