Skip to content

Commit 2d4bf0c

Browse files
authored
Make hierarchical Z buffer generation properly conservative. (#22603)
The single-pass downsampling (SPD) shader is properly conservative only for depth buffers with size lengths that are powers of two. This is because it assumes that, for any texel in mip level N+1, all texels in mip level N that contribute to that texel are contained within at most a 2×2 square, which is only true for textures that have side lengths that have powers of two. (For textures that have side lengths that aren't powers of two, proper conservative downsampling may require sampling up to a 3×3 square.) This PR solves the problem in a conservative way, by conceptually rounding up the side lengths of the depth buffer to the *next* power of two and scaling the depth buffer appropriately before performing downsampling. This ensures that the SPD shader only sees textures with side lengths that are powers of two at every step of the operation. Note "conceptually"; in reality this patch doesn't actually generate such an intermediate scaled texture. Instead, it changes the `load_mip_0` function in the shader to return the value that *would* have been produced by sampling such a scaled depth buffer. This is obviously more efficient than actually performing such a scaling operation. The sampling operations in the mesh preprocessing occlusion culling code required no changes, as they simply use `textureDimensions` on the hierarchical Z buffer to determine its size. I did, however, have to change the meshlet code to use `textureDimensions` like the mesh preprocessing code does. The meshlet culling indeed seems less broken now (albeit still broken); the rabbits on the right side don't flicker anymore in my testing. Note that this approach, while popular (e.g. in zeux's [Niagara]), is more conservative than a single-pass downsampler that properly handles 3×3 texel blocks would be. However, such a downsampler would be complex, and I figured it was better to make our occlusion culling correct, simple, and fast rather than possibly-complex and slow. This fix allows us to move occlusion culling out of experimental status. I opted not to do that in this PR in order to make it easier to review, but a follow-up PR should do that. [Niagara]: zeux/niagara#15 (comment)
1 parent 791a0e4 commit 2d4bf0c

File tree

3 files changed

+106
-17
lines changed

3 files changed

+106
-17
lines changed

crates/bevy_core_pipeline/src/mip_generation/experimental/depth.rs

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
//! Generation of hierarchical Z buffers for occlusion culling.
22
//!
3-
//! This is marked experimental because the shader is designed only for
4-
//! power-of-two texture sizes and is slightly incorrect for non-power-of-two
5-
//! depth buffer sizes.
3+
//! Currently, this module only supports generation of hierarchical Z buffers
4+
//! for occlusion culling.
65
76
use core::array;
87

@@ -515,10 +514,11 @@ impl ViewDepthPyramid {
515514
texture_label: &'static str,
516515
texture_view_label: &'static str,
517516
) -> ViewDepthPyramid {
518-
// Calculate the size of the depth pyramid.
517+
// Calculate the size of the depth pyramid. This is the size of the
518+
// depth buffer rounded down to the previous power of two.
519519
let depth_pyramid_size = Extent3d {
520-
width: size.x.div_ceil(2),
521-
height: size.y.div_ceil(2),
520+
width: previous_power_of_two(size.x),
521+
height: previous_power_of_two(size.y),
522522
depth_or_array_layers: 1,
523523
};
524524

@@ -616,6 +616,22 @@ impl ViewDepthPyramid {
616616
downsample_depth_first_pipeline: &ComputePipeline,
617617
downsample_depth_second_pipeline: &ComputePipeline,
618618
) {
619+
// We need to make sure that every mip level the single-pass
620+
// downsampling (SPD) shader sees has lengths that are powers of two for
621+
// correct conservative depth buffer downsampling. To do this, we
622+
// maintain the fiction that we're downsampling a depth buffer scaled up
623+
// so that it has side lengths rounded up to the next power of two. (If
624+
// the depth buffer already has a side length that's a power of two,
625+
// then we double it anyway; this ensures that we don't lose any
626+
// precision in the top level of the depth pyramid.) The
627+
// `downsample_depth` shader's `load_mip_0` function returns the value
628+
// that sampling such a depth buffer would yield, without actually
629+
// having to construct such a scaled depth buffer.
630+
let virtual_view_size = uvec2(
631+
(view_size.x + 1).next_power_of_two(),
632+
(view_size.y + 1).next_power_of_two(),
633+
);
634+
619635
let command_encoder = render_context.command_encoder();
620636
let mut downsample_pass = command_encoder.begin_compute_pass(&ComputePassDescriptor {
621637
label: Some(label),
@@ -625,7 +641,11 @@ impl ViewDepthPyramid {
625641
// Pass the mip count as a push constant, for simplicity.
626642
downsample_pass.set_push_constants(0, &self.mip_count.to_le_bytes());
627643
downsample_pass.set_bind_group(0, downsample_depth_bind_group, &[]);
628-
downsample_pass.dispatch_workgroups(view_size.x.div_ceil(64), view_size.y.div_ceil(64), 1);
644+
downsample_pass.dispatch_workgroups(
645+
virtual_view_size.x.div_ceil(64),
646+
virtual_view_size.y.div_ceil(64),
647+
1,
648+
);
629649

630650
if self.mip_count >= 7 {
631651
downsample_pass.set_pipeline(downsample_depth_second_pipeline);
@@ -712,3 +732,9 @@ pub(crate) fn prepare_downsample_depth_view_bind_groups(
712732
));
713733
}
714734
}
735+
736+
/// Returns the previous power of two of x, or, if x is exactly a power of two,
737+
/// returns x unchanged.
738+
fn previous_power_of_two(x: u32) -> u32 {
739+
1 << (31 - x.leading_zeros())
740+
}

crates/bevy_core_pipeline/src/mip_generation/experimental/downsample_depth.wgsl

Lines changed: 72 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ var<push_constant> constants: Constants;
2929

3030
/// Generates a hierarchical depth buffer.
3131
/// Based on FidelityFX SPD v2.1 https://github.com/GPUOpen-LibrariesAndSDKs/FidelityFX-SDK/blob/d7531ae47d8b36a5d4025663e731a47a38be882f/sdk/include/FidelityFX/gpu/spd/ffx_spd.h#L528
32+
///
33+
/// `mip_0` may be of any size, but `mip_1` and down must have side lengths that
34+
/// are powers of two.
3235

3336
// TODO:
3437
// * Subgroup support
@@ -307,32 +310,94 @@ fn reduce_load_mip_6(tex: vec2u) -> f32 {
307310
));
308311
}
309312

313+
// Loads the top mip level at virtual position (x, y).
314+
//
315+
// This is the value that *would be* returned from sampling a scaled depth
316+
// buffer with side lengths rounded up to the next power of two, without
317+
// actually constructing such a depth buffer.
318+
//
319+
// See the comments in `ViewDepthPyramid::downsample_depth` for more
320+
// information.
310321
fn load_mip_0(x: u32, y: u32) -> f32 {
322+
let actual_size = textureDimensions(mip_0).xy;
323+
let virtual_size = vec2<u32>(
324+
next_power_of_two(actual_size.x),
325+
next_power_of_two(actual_size.y)
326+
);
327+
let virtual_uv = (vec2<f32>(f32(x), f32(y)) + 0.5) / vec2<f32>(virtual_size);
311328
#ifdef MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
312-
let visibility = textureLoad(mip_0, vec2(x, y)).r;
313-
return bitcast<f32>(u32(visibility >> 32u));
329+
let virtual_st = virtual_uv * vec2<f32>(actual_size);
330+
let visibility = load_mip_0_meshlet(virtual_st, 32u);
331+
return reduce_4(visibility);
314332
#else // MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
315333
#ifdef MESHLET
316-
let visibility = textureLoad(mip_0, vec2(x, y)).r;
317-
return bitcast<f32>(visibility);
334+
let virtual_st = virtual_uv * vec2<f32>(actual_size);
335+
let visibility = load_mip_0_meshlet(virtual_st, 0u);
336+
return reduce_4(visibility);
318337
#else // MESHLET
319338
// Downsample the top level.
320339
#ifdef MULTISAMPLE
321340
// The top level is multisampled, so we need to loop over all the samples
322341
// and reduce them to 1.
323-
var result = textureLoad(mip_0, vec2(x, y), 0);
342+
let virtual_st = virtual_uv * vec2<f32>(actual_size);
343+
var result = load_mip_0_single_sample(virtual_st, 0);
324344
let sample_count = i32(textureNumSamples(mip_0));
325345
for (var sample = 1; sample < sample_count; sample += 1) {
326-
result = min(result, textureLoad(mip_0, vec2(x, y), sample));
346+
result = min(result, load_mip_0_single_sample(virtual_st, sample));
327347
}
328348
return result;
329349
#else // MULTISAMPLE
330-
return textureLoad(mip_0, vec2(x, y), 0);
350+
return reduce_4(textureGather(mip_0, samplr, virtual_uv));
331351
#endif // MULTISAMPLE
332352
#endif // MESHLET
333353
#endif // MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
334354
}
335355

356+
#ifdef MESHLET
357+
// Loads a single 2×2 square of texels at the given position from the source
358+
// image and returns all four (like `textureGather` does).
359+
//
360+
// `st` should be in texels, not in the [0, 1] range like UVs. That is, `st` is
361+
// `uv * textureDimensions(mip_0).xy`.
362+
fn load_mip_0_meshlet(st: vec2<f32>, shift: u32) -> vec4<f32> {
363+
let st0 = vec2<u32>(floor(st - 0.5));
364+
let st1 = st0 + 1u;
365+
return vec4<f32>(
366+
bitcast<f32>(u32(textureLoad(mip_0, vec2<u32>(st0.x, st0.y)).r) >> shift),
367+
bitcast<f32>(u32(textureLoad(mip_0, vec2<u32>(st0.x, st1.y)).r) >> shift),
368+
bitcast<f32>(u32(textureLoad(mip_0, vec2<u32>(st1.x, st0.y)).r) >> shift),
369+
bitcast<f32>(u32(textureLoad(mip_0, vec2<u32>(st1.x, st1.y)).r) >> shift)
370+
);
371+
}
372+
#endif // MESHLET
373+
374+
#ifdef MULTISAMPLE
375+
// Loads a single 2×2 square of texels at the given position from the source
376+
// image, reduces them, and returns the result.
377+
//
378+
// `st` should be in texels, not in the [0, 1] range like UVs. That is, `st` is
379+
// `uv * textureDimensions(mip_0).xy`.
380+
fn load_mip_0_single_sample(st: vec2<f32>, sample: i32) -> f32 {
381+
let st0 = vec2<u32>(floor(st - 0.5));
382+
let st1 = st0 + 1u;
383+
let v = vec4<f32>(
384+
textureLoad(mip_0, vec2<u32>(st0.x, st0.y), sample),
385+
textureLoad(mip_0, vec2<u32>(st0.x, st1.y), sample),
386+
textureLoad(mip_0, vec2<u32>(st1.x, st0.y), sample),
387+
textureLoad(mip_0, vec2<u32>(st1.x, st1.y), sample)
388+
);
389+
return reduce_4(v);
390+
}
391+
#endif // MULTISAMPLE
392+
336393
fn reduce_4(v: vec4f) -> f32 {
337394
return min(min(v.x, v.y), min(v.z, v.w));
338395
}
396+
397+
// Returns the next power of two of x.
398+
//
399+
// If x is itself a power of two, this still returns the *next* power of two.
400+
// This is different from Rust's `next_power_of_two` function.
401+
fn next_power_of_two(x: u32) -> u32 {
402+
return 1u << (32u - countLeadingZeros(x));
403+
}

crates/bevy_pbr/src/meshlet/meshlet_cull_shared.wgsl

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -143,9 +143,8 @@ fn sample_hzb_row(sx: vec4<u32>, sy: u32, mip: i32) -> f32 {
143143
return min(min(a, b), min(c, d));
144144
}
145145

146-
// TODO: We should probably be using a POT HZB texture?
147146
fn occlusion_cull_screen_aabb(aabb: ScreenAabb, screen: vec2<f32>) -> bool {
148-
let hzb_size = ceil(screen * 0.5);
147+
let hzb_size = vec2<f32>(textureDimensions(depth_pyramid).xy);
149148
let aabb_min = aabb.min.xy * hzb_size;
150149
let aabb_max = aabb.max.xy * hzb_size;
151150

@@ -157,7 +156,6 @@ fn occlusion_cull_screen_aabb(aabb: ScreenAabb, screen: vec2<f32>) -> bool {
157156
// note: add 1 before max because the unsigned overflow behavior is intentional
158157
// it wraps around firstLeadingBit(0) = ~0 to 0
159158
// TODO: we actually sample a 4x4 block, so ideally this would be `max(..., 3u) - 3u`.
160-
// However, since our HZB is not a power of two, we need to be extra-conservative to not over-cull, so we go up a mip.
161159
var mip = max(firstLeadingBit(max_size) + 1u, 2u) - 2u;
162160

163161
if any((max_texel >> vec2(mip)) > (min_texel >> vec2(mip)) + 3) {

0 commit comments

Comments
 (0)