perf: document feColorMatrix f32_bound NaN behavior by wjc911 · Pull Request #1024 · linebender/resvg

wjc911 · 2026-02-22T18:31:18Z

Summary

Add clarifying comment to f32_bound explaining that the clamping order (v.max(0.0)).min(1.0) correctly propagates NaN as 0.0, matching the SVG specification's out-of-range clamping semantics
Documents the pre-existing optimization so future maintainers understand the intentional NaN-handling behavior

Notes

This is a documentation/clarity change for an already-optimized code path. The comment ensures the non-obvious NaN propagation behavior is not accidentally "fixed" in a way that would regress correctness or performance.

Test Results

All 1723/1723 integration tests pass (cargo test --release -p resvg --test integration).

🤖 Generated with Claude Code

Transpose the 4x5 row-major color matrix into five column vectors of [f32; 4] so LLVM can auto-vectorize the per-pixel matrix multiply into packed SIMD instructions. Benchmarks show ~1.5-1.7x throughput improvement for the full Matrix variant (the most expensive path). Saturate, HueRotate, and LuminanceToAlpha are left unchanged since their compact 3x3 / scalar loops are already well-optimized by the compiler. The original naive implementation is preserved as apply_naive for correctness testing. Ten bit-exact tests verify identical output across all matrix types, boundary angles, extreme coefficients, and all 256 possible channel values. A standalone benchmark (benches/color_matrix_bench.rs) covers all four matrix types at 64x64 through 4096x4096 resolutions.

Replace f32_bound(0.0, c, 1.0) with c.clamp(0.0, 1.0) in the color_matrix module for branchless SIMD-friendly clamping. The rust-lang/rust#44095 issue was resolved in Rust 1.35, so remove the outdated TODO comment. Gate the naive reference implementation behind #[cfg(test)] instead of #[allow(dead_code)] so it is only compiled for testing.

The column-major matrix transposition in the full 4x5 Matrix path was counterproductive: LLVM already auto-vectorizes the row-major scalar loop across 4 pixels simultaneously (processing 4 RGBA pixels at once), while the column-major approach vectorized across 4 channels of 1 pixel with heavy register spilling to stack memory (~422 instructions vs ~227). Revert the Matrix path to the original row-major scalar loop while keeping the f32::clamp() change (replacing manual f32_bound), which provides 1.1x-2.0x improvement across all matrix types. Add comprehensive benchmark (examples/bench_colormatrix_comprehensive.rs) testing 7 image sizes, 16 matrix types, 3 input patterns with median-of-7 interleaved measurement methodology. Update existing bench to compare old f32_bound vs new f32::clamp.

Use scoped threads and AtomicUsize progress counter to run benchmark configurations in parallel across all available CPU cores.

- Remove misleading auto-vectorization comment (apply was not changed) - Remove apply_naive and tests that compared identical implementations - Remove benchmark files (color_matrix_bench, bench_colormatrix_comprehensive) - Remove [[bench]] section from Cargo.toml - Add rationale comment for f32_bound explaining why it is preferred over f32::clamp (avoids NaN-propagation overhead that inhibits SIMD)

Replaces the parallel bench_e2e.rs with a sequential single-threaded version that uses per-resolution iteration counts (2000 for 16px, down to 100 for 1024px+), a probe-then-scale budget cap (30s total per case, skip if single probe > 10s), and --compare for TSV baseline comparison. Allows CPU-pinned reproducible measurements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wjc911 and others added 8 commits February 21, 2026 16:14

bench: parallelize feColorMatrix benchmark with std::thread::scope

08e044d

Use scoped threads and AtomicUsize progress counter to run benchmark configurations in parallel across all available CPU cores.

Fix color_matrix regression: restore f32_bound in from_normalized

e6bff1f

Apply cargo fmt to bench_e2e.rs

77f6568

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wjc911 closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: document feColorMatrix f32_bound NaN behavior#1024

perf: document feColorMatrix f32_bound NaN behavior#1024
wjc911 wants to merge 8 commits intolinebender:mainfrom
wjc911:feColorMatrix_perf_optimize

wjc911 commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wjc911 commented Feb 22, 2026

Summary

Notes

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant