GPU Voxel Renderer + Streaming World (wgpu)
This project is a real-time voxel world renderer built around wgpu (WebGPU-native graphics API in Rust) and WGSL (WebGPU Shading Language). The core idea is:
- Stream a chunked voxel world around the camera.
- Ray-trace the world on the GPU using a sparse data structure.
- Layer atmosphere and lighting (fog, clouds, godrays).
- Present the final image with a lightweight fullscreen blit + tiny FPS HUD.
It’s structured like a minimal “engine loop” (winit) + a rendering backend (wgpu) + world/streaming systems.
Default (profiling off):
cargo run
Enable profiling:
cargo run -- --profile
Optional print cadence:
cargo run -- --profile --profile-every-ms 250
- GPU ray tracing of voxels (compute shader), with lighting and volumetrics.
- Sparse voxel octree (SVO) traversal with ropes for fast neighbor stepping.
- Chunk streaming driven by camera position and forward direction.
- Macro-occupancy acceleration (coarse empty-space skipping inside chunks).
- Toroidal clipmap heightfield for terrain/ground intersection & shading.
- Multi-pass GPU pipeline: primary → godray → composite → blit.
- GPU timestamp profiling when supported (TIMESTAMP_QUERY).
- Minimal FPS overlay rendered in the final blit shader.
The app runs a classic winit event loop:
AboutToWaitrequests a redraw every iteration (continuous rendering).RedrawRequestedperforms the entire frame (update + render).- Resize events reconfigure the swapchain and recreate internal textures.
Frame steps (in order):
- Integrate input → camera (
camera.integrate_input) - Streaming update: decide which chunks should be loaded/updated
- Clipmap update (CPU): compute per-level origins/offsets and height patches
- Write camera uniform (view/projection matrices, params, grid info)
- Write overlay uniform (packed digits + HUD layout)
- Apply chunk uploads (batched buffer writes)
- Acquire swapchain frame
- Encode GPU passes (compute + blit)
- Submit + poll + present
- Optional profiling readback
The renderer is compute-driven, writing into intermediate textures, then presenting with a simple render pass.
- Writes:
color_tex(HDR color, RGBA32F / “full-float HDR”)depth_tex(scene depth proxy, R32F)
- Uses:
- Camera + scene buffers
- Chunk grid + chunk metadata
- SVO nodes + rope links
- Macro occupancy + column info (extra acceleration data)
- Clipmap params + clipmap height texture array
- Uses a history texture (ping/pong) with temporal accumulation.
- Outputs a godray buffer (also HDR).
- Uses a sampler for filtered history reads.
- Combines primary color + godrays into a final full-resolution output texture.
- Includes depth-aware upsampling and sharpening for godrays.
- Performs tonemapping (filmic curve), bloom-ish bright extraction, and grading.
- Fullscreen triangle.
- Samples the final output texture and draws into the swapchain.
- Renders a tiny FPS HUD (3×5 digit font) only inside a small screen rect.
The world is streamed in chunks around the camera. A GPU “chunk grid” maps grid cells to resident chunk slots.
Each chunk on GPU has:
ChunkMetaGpu: origin, node arena base/count, macro base, column-info base- Chunk grid entry: maps local grid index → chunk slot or INVALID
Voxels are stored in an SVO node arena (NodeGpu). Each node holds:
child_base+child_maskfor compact child addressingmaterialkeyencoding the node’s level and coordinates (used to reconstruct bounds)
Traversal uses:
- AABB (Axis-Aligned Bounding Box) slab tests for entry/exit intervals
- Ropes (
NodeRopesGpu) to jump to neighboring nodes when exiting a leaf cube - Sparse descent that can return:
- a real leaf node (explicit)
- an implicit “air leaf” from a missing child (virtual cube), with an anchor to continue from
Inside each chunk, a coarse 8×8×8 macro grid is stored as bits (512 bits = 16 u32 words per chunk). This allows fast skipping of empty macro-cells using a 3D DDA (Digital Differential Analyzer) grid march before doing expensive leaf traversal.
A compact 64×64 per-chunk “column-top map” packs (y, material) per (x,z) column into u16 entries. It’s used to cheaply decide where grass blades might be, without doing full voxel traversal everywhere.
A toroidal clipmap stores height patches in a 2D array texture:
- Format: R32Float
- Layout:
texture_2d_array, one layer per clipmap level - CPU updates decide which patches to upload each frame
The shader samples the clipmap using:
- Per-level origin in meters
- Per-level cell size
- Per-level toroidal offsets in texels
Clipmap texture uploads and clipmap uniform updates must be encoded before the compute pass they affect. The code ensures the clipmap patch uploads + uniform write occur in the correct order so uniforms (origin/offset) cannot get ahead of texture content.
Two layers of profiling exist:
- CPU-side frame profiling: measures camera update, streaming, encoding, submit, present, etc.
- GPU timestamps (if supported): measures primary, godray, composite, and blit pass times via
TIMESTAMP_QUERY.
When GPU timestamps are enabled, the renderer resolves timestamps into a buffer and maps it for readback, converting timestamp ticks to milliseconds.
src/app/
Main loop orchestration, resize handling, per-frame sequencing.src/render/
GPU types, resources, shader bundling, renderer state (pipelines/buffers/textures/bind groups).src/shaders/
WGSL shader modules (common utilities + raytracing modules + clipmap + blit).src/streaming/
ChunkManager and upload budgeting (CPU→GPU streaming).src/world/
WorldGen / procedural world source.
- HDR color:
Rgba32Float - Depth proxy:
R32Float - Clipmap height:
R32Floatarray texture - GPU scene buffers: storage buffers (
NodeGpu,ChunkMetaGpu, macro occupancy bits, rope links, column info)
- The renderer is intentionally compute-first: most “rendering” logic lives in WGSL compute entry points.
- The scene bind group is shared across passes where possible; specialized bind groups are used for ping-pong textures (godrays/composite).
- Chunk uploads are aggressively batched (merged adjacent regions, contiguous meta runs) to reduce
queue.write_buffercalls.
Common next steps:
- Add proper LOD selection for clipmap sampling (currently it can be fixed-level).
- Add a material system and/or voxel editing.
- Add denoising/temporal filtering for primary pass output.
- Expose profiler output in an on-screen overlay.
