Skip to content

perf: document feColorMatrix f32_bound NaN behavior#1024

Closed
wjc911 wants to merge 8 commits intolinebender:mainfrom
wjc911:feColorMatrix_perf_optimize
Closed

perf: document feColorMatrix f32_bound NaN behavior#1024
wjc911 wants to merge 8 commits intolinebender:mainfrom
wjc911:feColorMatrix_perf_optimize

Conversation

@wjc911
Copy link

@wjc911 wjc911 commented Feb 22, 2026

Summary

  • Add clarifying comment to f32_bound explaining that the clamping order (v.max(0.0)).min(1.0) correctly propagates NaN as 0.0, matching the SVG specification's out-of-range clamping semantics
  • Documents the pre-existing optimization so future maintainers understand the intentional NaN-handling behavior

Notes

This is a documentation/clarity change for an already-optimized code path. The comment ensures the non-obvious NaN propagation behavior is not accidentally "fixed" in a way that would regress correctness or performance.

Test Results

All 1723/1723 integration tests pass (cargo test --release -p resvg --test integration).

🤖 Generated with Claude Code

wjc911 and others added 8 commits February 21, 2026 16:14
Transpose the 4x5 row-major color matrix into five column vectors of
[f32; 4] so LLVM can auto-vectorize the per-pixel matrix multiply into
packed SIMD instructions. Benchmarks show ~1.5-1.7x throughput
improvement for the full Matrix variant (the most expensive path).

Saturate, HueRotate, and LuminanceToAlpha are left unchanged since their
compact 3x3 / scalar loops are already well-optimized by the compiler.

The original naive implementation is preserved as apply_naive for
correctness testing. Ten bit-exact tests verify identical output across
all matrix types, boundary angles, extreme coefficients, and all 256
possible channel values.

A standalone benchmark (benches/color_matrix_bench.rs) covers all four
matrix types at 64x64 through 4096x4096 resolutions.
Replace f32_bound(0.0, c, 1.0) with c.clamp(0.0, 1.0) in the
color_matrix module for branchless SIMD-friendly clamping. The
rust-lang/rust#44095 issue was resolved in Rust 1.35, so remove the
outdated TODO comment. Gate the naive reference implementation behind
#[cfg(test)] instead of #[allow(dead_code)] so it is only compiled
for testing.
The column-major matrix transposition in the full 4x5 Matrix path was
counterproductive: LLVM already auto-vectorizes the row-major scalar
loop across 4 pixels simultaneously (processing 4 RGBA pixels at once),
while the column-major approach vectorized across 4 channels of 1 pixel
with heavy register spilling to stack memory (~422 instructions vs ~227).

Revert the Matrix path to the original row-major scalar loop while
keeping the f32::clamp() change (replacing manual f32_bound), which
provides 1.1x-2.0x improvement across all matrix types.

Add comprehensive benchmark (examples/bench_colormatrix_comprehensive.rs)
testing 7 image sizes, 16 matrix types, 3 input patterns with median-of-7
interleaved measurement methodology.

Update existing bench to compare old f32_bound vs new f32::clamp.
Use scoped threads and AtomicUsize progress counter to run benchmark
configurations in parallel across all available CPU cores.
- Remove misleading auto-vectorization comment (apply was not changed)
- Remove apply_naive and tests that compared identical implementations
- Remove benchmark files (color_matrix_bench, bench_colormatrix_comprehensive)
- Remove [[bench]] section from Cargo.toml
- Add rationale comment for f32_bound explaining why it is preferred
  over f32::clamp (avoids NaN-propagation overhead that inhibits SIMD)
Replaces the parallel bench_e2e.rs with a sequential single-threaded
version that uses per-resolution iteration counts (2000 for 16px,
down to 100 for 1024px+), a probe-then-scale budget cap (30s total
per case, skip if single probe > 10s), and --compare for TSV baseline
comparison. Allows CPU-pinned reproducible measurements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wjc911 wjc911 closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant