Skip to content

Use half-precision types in bloom downsample/upsample WGSL shaders#8508

Merged
mvaligursky merged 1 commit intomainfrom
mv-bloom-half-precision
Mar 6, 2026
Merged

Use half-precision types in bloom downsample/upsample WGSL shaders#8508
mvaligursky merged 1 commit intomainfrom
mv-bloom-half-precision

Conversation

@mvaligursky
Copy link
Contributor

@mvaligursky mvaligursky commented Mar 6, 2026

Use half-precision (f16) types for intermediate computations in the bloom downsample and upsample WGSL shaders.

Changes:

  • Convert all texture sample intermediates and accumulation in the WGSL downsample shader (13-tap filter) from vec3f to half3, including BOXFILTER and PREMULTIPLY paths
  • Convert all texture sample intermediates and accumulation in the WGSL upsample shader (9-tap tent filter) from vec3f to half3
  • Restructure upsample accumulation in both WGSL and GLSL shaders from accumulate-then-divide to multiply-before-accumulate, preventing potential f16 overflow (intermediate values previously reached 16x the sample value, overflowing f16 at HDR values above 4094)

Performance:

  • Reduces register pressure (~39 f32 registers to ~20 f16 registers in the downsample shader) improving GPU occupancy
  • Enables 2x ALU throughput on GPUs with native f16 support (AMD RDNA, Apple, Qualcomm, ARM Mali)
  • Falls back to f32 automatically on devices without shader-f16 support via existing half type aliases

Converts intermediate computations from f32 to half (f16) types in the
bloom shader chain, reducing register pressure and enabling 2x ALU
throughput on GPUs with native f16 support.

Restructures upsample accumulation from accumulate-then-divide to
multiply-before-accumulate to prevent f16 overflow.

Made-with: Cursor
@mvaligursky mvaligursky self-assigned this Mar 6, 2026
@mvaligursky mvaligursky requested review from a team and Copilot March 6, 2026 14:16
@mvaligursky mvaligursky added performance Relating to load times or frame rate area: graphics Graphics related issue labels Mar 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the bloom render-pass downsample/upsample shader chunks to use half-precision (half*) for intermediate computations in WGSL, and adjusts the upsample accumulation math to reduce risk of intermediate overflow while preserving the filter result.

Changes:

  • WGSL bloom downsample: convert sampled/intermediate color values and accumulation from vec3f to half3 (including BOXFILTER + PREMULTIPLY paths).
  • WGSL bloom upsample: convert sampled/intermediate color values and accumulation from vec3f to half3.
  • WGSL + GLSL bloom upsample: change from “accumulate then divide” to pre-weighted accumulation (equivalent normalization, lower peak intermediate magnitude).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/scene/shader-lib/wgsl/chunks/render-pass/frag/upsample.js Uses half3 intermediates and pre-weighted accumulation; casts back to vec3f for output.
src/scene/shader-lib/wgsl/chunks/render-pass/frag/downsample.js Uses half3 intermediates/accumulation (incl. BOXFILTER/PREMULTIPLY) and casts back to vec3f for output.
src/scene/shader-lib/glsl/chunks/render-pass/frag/upsample.js Switches to pre-weighted accumulation constants equivalent to prior normalize-by-16 approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mvaligursky mvaligursky merged commit 962f9ef into main Mar 6, 2026
12 checks passed
@mvaligursky mvaligursky deleted the mv-bloom-half-precision branch March 6, 2026 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: graphics Graphics related issue performance Relating to load times or frame rate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants