Skip to content

Commit bb7359f

Browse files
committed
backwards compat and some small fixes
1 parent efd5fdd commit bb7359f

File tree

8 files changed

+172
-46
lines changed

8 files changed

+172
-46
lines changed

docs/src/gpu_raytracing_tutorial.md

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# GPU Ray Tracing with Raycore
1+
# Ray Tracing on the GPU
22

33
In this tutorial, we'll take the ray tracer from the previous tutorial and port it to the GPU using **KernelAbstractions.jl** and a GPU backend of choice (CUDA.jl, AMDGPU.jl, OpenCL.jl, OneApi.jl, or Metal.jl). We'll explore three different kernel implementations, each with different optimization strategies, and benchmark their performance against each other.
44

@@ -13,8 +13,7 @@ using WGLMakie
1313
using KernelAbstractions
1414
using BenchmarkTools
1515
```
16-
17-
To run things on the GPU with KernelAbstractions, you need to chose the correct package for your GPU and set the array type we use from there on.
16+
To run things on the GPU with KernelAbstractions, you need to choose the correct package for your GPU and set the array type we use from there on.
1817

1918
```julia (editor=true, logging=false, output=true)
2019
#using CUDA; GArray = CuArray; # For NVIDIA GPUS
@@ -28,8 +27,7 @@ GArray = Array # For the tutorial to run on CI we just use the CPU
2827
**Ready for GPU!** We have:
2928

3029
* `Raycore` for fast ray-triangle intersections
31-
* `KernelAbstractions` for portable GPU kernels
32-
* `AMDGPU` for AMD GPU support
30+
* `KernelAbstractions` for portable GPU kernels (works with CUDA, AMD, Metal, oneAPI, and OpenCL)
3331
* `BenchmarkTools` for performance comparison
3432

3533
## Part 1: Scene Setup (Same as CPU Tutorial)
@@ -46,12 +44,12 @@ f
4644
```
4745
```julia (editor=true, logging=false, output=true)
4846
cam = cameracontrols(ax.scene)
49-
cam.eyeposition[] = [0, 1.0, -5]
47+
cam.eyeposition[] = [0, 1.0, -4]
5048
cam.lookat[] = [0, 0, 2]
5149
cam.upvector[] = [0.0, 1, 0.0]
5250
cam.fov[] = 45.0
5351
```
54-
## Part 5: GPU Kernel Version 1 - Basic Naive Approach
52+
## Part 2: GPU Kernel Version 1 - Basic Naive Approach
5553

5654
The simplest GPU kernel - one thread per pixel:
5755

@@ -70,7 +68,7 @@ import KernelAbstractions as KA
7068
x = ((idx - 1) % width) + 1
7169
y = ((idx - 1) ÷ width) + 1
7270
if x <= width && y <= height
73-
# Generate camera ray and do a calculate a simple light model
71+
# Generate camera ray and calculate a simple light model
7472
color = Vec3f(0)
7573
for i in 1:NSamples
7674
color = color .+ sample_light(bvh, ctx, width, height, camera_pos, focal_length, aspect, x, y, sky_color)
@@ -129,12 +127,12 @@ img_gpu = GArray(img);
129127
bvh_gpu = to_gpu(GArray, bvh);
130128
ctx_gpu = to_gpu(GArray, ctx);
131129
bench_kernel_v1 = @benchmark trace_gpu(raytrace_kernel_v1!, img_gpu, bvh_gpu, ctx_gpu)
132-
# bring back to GPU to display image
130+
# Bring back to CPU to display image
133131
Array(img_gpu)
134132
```
135133
**First GPU render!** This is the simplest approach - one thread per pixel with no optimization.
136134

137-
## Part 6: Optimized Kernel - Loop Unrolling
135+
## Part 3: Optimized Kernel - Loop Unrolling
138136

139137
Loop overhead is significant on GPUs! Manually unrolling the sampling loop eliminates this overhead:
140138

@@ -169,7 +167,7 @@ Array(img_gpu)
169167
* Better instruction-level parallelism
170168
* **1.39x faster than baseline!**
171169

172-
## Part 7: Tiled Kernel with Optimized Tile Size
170+
## Part 4: Tiled Kernel with Optimized Tile Size
173171

174172
The tile size dramatically affects performance. Let's use the optimal size discovered through benchmarking:
175173

@@ -223,10 +221,9 @@ Array(img_gpu)
223221
```
224222
**Tile size matters!** With `(32, 16)` tiles, this kernel is **1.22x faster** than baseline. With poor tile sizes like `(8, 8)`, it can be **2.5x slower**!
225223

226-
## Part 8: Wavefront Path Tracing
224+
## Part 5: Wavefront Path Tracing
227225

228-
The wavefront approach reorganizes ray tracing to minimize thread divergence by grouping similar work together. Instead of each thread handling an entire pixel's path, we separate the work into stages.
229-
Discussing the excat implementation is outside the scope of this tutorial, so we only include the finished renderer here:
226+
The wavefront approach reorganizes ray tracing to minimize thread divergence by grouping similar work together. Instead of each thread handling an entire pixel's path, we separate the work into stages. Discussing the exact implementation is outside the scope of this tutorial, so we only include the finished renderer here:
230227

231228
```julia (editor=true, logging=false, output=true)
232229
include("wavefront-renderer.jl")
@@ -254,7 +251,7 @@ Array(renderer_gpu.framebuffer)
254251
* Scales well with scene complexity
255252
* Enables advanced features like path tracing
256253

257-
## Part 9: Comprehensive Performance Benchmarks
254+
## Part 6: Comprehensive Performance Benchmarks
258255

259256
Now let's compare all kernels including the wavefront renderer:
260257

@@ -277,6 +274,7 @@ DOM.img(src=Asset(data"gpu-benchmarks.png"), width="700px")
277274
```
278275
### Next Steps
279276

280-
* Add **adaptive sampling** (more samples only where needed)
281-
* Explore **shared memory** optimizations for BVH traversal
282-
* Implement **streaming multisampling** across frames
277+
* Add **adaptive sampling** (more samples only where needed)
278+
* Explore **shared memory** optimizations for BVH traversal
279+
* Implement **streaming multisampling** across frames
280+

docs/src/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@ Build a complete ray tracer from scratch with shadows, materials, reflections, a
2727

2828
[Ray Tracing with Raycore](@ref)
2929

30-
### Ray on the GPU Tutorial
30+
### GPU Ray Tracing Tutorial
3131

32-
Take the previous ray tracer and run it on the GPU
32+
Port the ray tracer to the GPU with KernelAbstractions.jl. Learn about kernel optimization, loop unrolling, tiling, and wavefront rendering.
3333

34-
![Ray Tracing](.gpu_raytracing_tutorial-bbook/data/gpu-benchmarks.png)
34+
![GPU Ray Tracing](.gpu_raytracing_tutorial-bbook/data/gpu-benchmarks.png)
3535

3636
[GPU Ray Tracing with Raycore](@ref)
3737

docs/src/raytracing_tutorial_content.md

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Ray Tracing in one Hour
22

3-
Analougus to the famous [Ray Tracing in one Weekend](https://raytracing.github.io/), this tutorial uses Raycore to do the hard work of performant ray triangle intersection and therefore get a high performing ray tracer in a much shorter time.
4-
We'll start with the absolute basics and progressively add features until we have a ray tracer that produces beautiful images with shadows, materials, and reflections.
3+
Analougus to the famous [Ray Tracing in one Weekend](https://raytracing.github.io/), this tutorial uses Raycore to do the hard work of performant ray triangle intersection and therefore get a high performing ray tracer in a much shorter time. We'll start with the absolute basics and progressively add features until we have a ray tracer that produces beautiful images with shadows, materials, and reflections.
54

65
## Setup
76

@@ -49,9 +48,10 @@ sphere2 = Tesselation(Sphere(Point3f(2, -1.5 + 0.6, 1), 0.6f0), 64)
4948
# Build our BVH acceleration structure
5049
scene_geometry = [cat_mesh, floor, back_wall, left_wall, sphere1, sphere2]
5150
bvh = Raycore.BVH(scene_geometry)
52-
plot(bvh; axis=(; show_axis=false))
51+
f, ax, pl = plot(bvh; axis=(; show_axis=false))
5352
```
5453
Set the camera to something better:
54+
5555
```julia (editor=true, logging=false, output=true)
5656
cam = cameracontrols(ax.scene)
5757
cam.eyeposition[] = [0, 1.0, -4]
@@ -61,7 +61,6 @@ cam.fov[] = 45.0
6161
update_cam!(ax.scene, cam)
6262
nothing
6363
```
64-
6564
## Part 2: Helper Functions - Building Blocks
6665

6766
Let's define reusable helper functions we'll use throughout:
@@ -86,12 +85,9 @@ end
8685
to_vec3f(c::RGB) = Vec3f(c.r, c.g, c.b)
8786
to_rgb(v::Vec3f) = RGB{Float32}(v...)
8887
```
89-
9088
## Part 3: The Simplest Ray Tracer - Depth Visualization
9189

92-
We're using one main function to shoot rays for each pixel.
93-
For simplicity, we already added multisampling and simple multi threading, to enjoy smoother images and faster rendering times throughout the tutorial.
94-
Read the GPU tutorial how to further improve the performance of this simple, not yet optimal kernel.
90+
We're using one main function to shoot rays for each pixel. For simplicity, we already added multisampling and simple multi threading, to enjoy smoother images and faster rendering times throughout the tutorial. Read the GPU tutorial how to further improve the performance of this simple, not yet optimal kernel.
9591

9692
```julia (editor=true, logging=false, output=true)
9793
function trace(f, bvh; width=700, height=300,
@@ -130,7 +126,6 @@ depth_kernel(bvh, ctx, tri, dist, bary, ray) = RGB(1.0f0 - min(dist / 10.0f0, 1.
130126
```julia (editor=true, logging=false, output=true)
131127
@time trace(depth_kernel, bvh, samples=16)
132128
```
133-
134129
**First render!** Depth visualization shows distance to surfaces. **Much faster with threading and smoother with multi-sampling!**
135130

136131
## Part 5: Lighting with Hard Shadows
@@ -202,7 +197,6 @@ end
202197

203198
trace(shadow_kernel, bvh, samples=4)
204199
```
205-
206200
**Hard shadows working!** Scene has realistic lighting with sharp shadow edges.
207201

208202
## Part 6: Soft Shadows
@@ -351,8 +345,7 @@ end
351345

352346
tone_mapping(img, a=0.38, y=1.0)
353347
```
354-
For performance type stability is a must!
355-
We can use JET to test if a function is completely type stable, which we also test in the Raycore tests for all functions.
348+
For performance type stability is a must! We can use JET to test if a function is completely type stable, which we also test in the Raycore tests for all functions.
356349

357350
```julia (editor=true, logging=false, output=true)
358351
using JET
@@ -404,3 +397,4 @@ We built a complete ray tracer with:
404397
* `shadow_samples=4` → soft shadows
405398

406399
This shows how a well-designed function can handle multiple use cases cleanly!
400+

docs/src/wavefront-renderer.jl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -619,7 +619,7 @@ function WavefrontRenderer(
619619

620620
# Allocate work queues as SoA
621621
primary_ray_queue = similar_soa(img, PrimaryRayWork, num_rays)
622-
primary_hit_queue = similar_soa(img, PrimaryHitWork{eltype(bvh.original_triangles)}, num_rays)
622+
primary_hit_queue = similar_soa(img, PrimaryHitWork{eltype(bvh.primitives)}, num_rays)
623623
shadow_ray_queue = similar_soa(img, ShadowRayWork, num_shadow_rays)
624624
shadow_result_queue = similar_soa(img, ShadowResult, num_shadow_rays)
625625
reflection_ray_soa = similar_soa(img, ReflectionRayWork, num_rays)
@@ -758,7 +758,7 @@ function render!(renderer::WavefrontRenderer)
758758
refl_shade_kernel!(
759759
renderer.primary_hit_queue,
760760
renderer.reflection_hit_soa,
761-
renderer.bvh.original_triangles,
761+
renderer.bvh.primitives,
762762
renderer.ctx,
763763
renderer.sky_color,
764764
renderer.shading_queue,
@@ -805,7 +805,7 @@ function trace_wavefront_full(
805805

806806
# Allocate work queues as SoA
807807
primary_ray_queue = similar_soa(img, PrimaryRayWork, num_rays)
808-
primary_hit_queue = similar_soa(img, PrimaryHitWork{eltype(bvh.original_triangles)}, num_rays)
808+
primary_hit_queue = similar_soa(img, PrimaryHitWork{eltype(bvh.primitives)}, num_rays)
809809
shadow_ray_queue = similar_soa(img, ShadowRayWork, num_shadow_rays)
810810
shadow_result_queue = similar_soa(img, ShadowResult, num_shadow_rays)
811811
reflection_ray_soa = similar_soa(img, ReflectionRayWork, num_rays)
@@ -848,7 +848,7 @@ function trace_wavefront_full(
848848

849849
# Stage 8: Shade reflections and blend (using SoA)
850850
refl_shade_kernel! = shade_reflections_and_blend!(backend)
851-
refl_shade_kernel!(primary_hit_queue, reflection_hit_soa, bvh.original_triangles, ctx, sky_color, shading_queue, ndrange=num_rays)
851+
refl_shade_kernel!(primary_hit_queue, reflection_hit_soa, bvh.primitives, ctx, sky_color, shading_queue, ndrange=num_rays)
852852

853853
# Stage 9: Accumulate final image
854854
accum_kernel! = accumulate_final!(backend)

ext/RaycoreMakieExt.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ Helper function to draw BVH geometry
186186
function draw_bvh!(plot, bvh::Raycore.BVH, colors, alpha)
187187
# Group primitives by their material_idx
188188
primitive_groups = Dict{UInt32, Vector{Raycore.Triangle}}()
189-
for prim in bvh.original_triangles
189+
for prim in bvh.primitives
190190
mat_idx = prim.material_idx
191191
if !haskey(primitive_groups, mat_idx)
192192
primitive_groups[mat_idx] = Raycore.Triangle[]
@@ -236,7 +236,7 @@ function Makie.convert_arguments(::Type{Makie.Mesh}, bvh::Raycore.BVH)
236236
faces = GeometryBasics.TriangleFace{Int}[]
237237
colors = Float32[]
238238
normals = Vec3f[]
239-
for (i, prim) in enumerate(bvh.original_triangles)
239+
for (i, prim) in enumerate(bvh.primitives)
240240
start_idx = length(vertices)
241241
for (v, n) in zip(prim.vertices, prim.normals)
242242
push!(vertices, v)

raycore-blogpost.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Announcing Raycore.jl: High-Performance Ray Tracing for CPU and GPU
2+
3+
I'm excited to announce **Raycore.jl**, a high-performance ray-triangle intersection engine with BVH acceleration, designed for both CPU and GPU execution in Julia. Whether you're building a physically-based renderer, simulating light transport, or exploring acoustic propagation, Raycore provides the performance and flexibility you need.
4+
5+
## Why Write a New Ray Intersection Engine?
6+
7+
You might wonder: why build yet another ray tracer? The answer lies in Julia's unique strengths and the opportunities they create.
8+
9+
### Advantages of Julia
10+
11+
* **High-level language with performance close to C/C++** - Write readable code that runs fast
12+
* **Great GPU support** - Single codebase runs on CUDA, AMD, Metal, oneAPI, and OpenCL via KernelAbstractions.jl
13+
* **Multiple dispatch for different geometries, algorithms, and materials** - Extend the system cleanly without modifying core code
14+
* **Pluggable architecture for new features** - Add custom materials, sampling strategies, or acceleration structures
15+
* **One of the best languages to write out math** - The code looks like the equations you'd write on paper
16+
17+
### Honest Assessment: The Tradeoffs
18+
19+
Julia isn't perfect, and I want to be upfront about the challenges:
20+
21+
* **Long compile times for first use** - The first run of a function triggers JIT compilation
22+
* **GPU code still has some rough edges** - Complex kernels require careful attention to avoid allocations and GPU-unfriendly constructs
23+
24+
In practice, compile times aren't as bad as they might sound. You keep a Julia session running and only pay the compilation cost once. There's also ongoing work on precompilation that could reduce these times to near-zero in the future. For GPU code, better tooling for detecting and fixing issues is on the horizon, along with improved error messages when problematic LLVM code is generated.
25+
26+
### The Big Picture
27+
28+
The flexibility to write a high-performance ray tracer in a high-level language opens up exciting possibilities:
29+
30+
* **Use automatic differentiation** to optimize scene parameters or light placement
31+
* **Plug in new optimizations seamlessly** without fighting a type system or rewriting core algorithms
32+
* **Democratize working on high-performance ray tracing** - contributions don't require C++ expertise
33+
* **Rapid experimentation** - test new ideas without lengthy compile cycles
34+
35+
## What is Raycore.jl?
36+
37+
Raycore is a focused library that does one thing well: fast ray-triangle intersections with BVH acceleration. It provides the building blocks for ray tracing applications without imposing a particular rendering architecture.
38+
39+
**Core Features:**
40+
- Fast BVH construction and traversal
41+
- CPU and GPU support via KernelAbstractions.jl
42+
- Analysis tools: centroid calculation, illumination analysis, view factors for radiosity
43+
- Makie integration for visualization
44+
45+
**Performance:** On GPU, we've achieved significant speedups through kernel optimizations including loop unrolling, tiling, and wavefront rendering approaches that minimize thread divergence.
46+
47+
## Interactive Tutorials
48+
49+
The documentation includes several hands-on tutorials that build from basics to advanced GPU optimization:
50+
51+
### 1. BVH Hit Tests & Basics
52+
53+
Learn the fundamentals of ray-triangle intersection, BVH construction, and visualization.
54+
55+
![BVH Basics](docs/src/basics.png)
56+
57+
[Try the tutorial →](https://docs.raycore.jl)
58+
59+
### 2. Ray Tracing Tutorial
60+
61+
Build a complete ray tracer from scratch with shadows, materials, reflections, and tone mapping.
62+
63+
![Ray Tracing](docs/src/raytracing.png)
64+
65+
[Try the tutorial →](https://docs.raycore.jl)
66+
67+
### 3. Ray Tracing on the GPU
68+
69+
Port the ray tracer to GPU and learn optimization techniques: loop unrolling, tiling, and wavefront rendering. Includes comprehensive benchmarks comparing different approaches.
70+
71+
![GPU Benchmarks](docs/src/.gpu_raytracing_tutorial-bbook/data/gpu-benchmarks.png)
72+
73+
[Try the tutorial →](https://docs.raycore.jl)
74+
75+
### 4. View Factors & Analysis
76+
77+
Calculate view factors, illumination, and centroids for radiosity and thermal analysis applications.
78+
79+
![View Factors](docs/src/viewfactors.png)
80+
81+
[Try the tutorial →](https://docs.raycore.jl)
82+
83+
## What Can It Be Used For?
84+
85+
Ray tracing isn't just for rendering pretty pictures. Raycore enables a wide range of physics and engineering applications:
86+
87+
* **Physically-based rendering** - Photorealistic image synthesis with accurate light transport
88+
* **Light transport simulations** - Analyze lighting design, daylighting, and energy efficiency
89+
* **Acoustic simulations** - Model sound propagation in architectural spaces
90+
* **Neutron transport simulations** - Nuclear reactor analysis and radiation shielding
91+
* **Thermal radiosity** - Heat transfer analysis in complex geometries
92+
* **Any application that needs ray tracing** - The core is general-purpose
93+
94+
A high-performance implementation of CPU and GPU ray tracing in Julia can be a huge enabler for research and development in these fields, especially considering how easy it is to jump into the code and make changes dynamically. Need to add a new material model? Write a few methods. Want to try a different BVH construction algorithm? Implement the interface. The barrier to experimentation is low.
95+
96+
## Getting Started
97+
98+
Install Raycore.jl from the Julia package manager:
99+
100+
```julia
101+
using Pkg
102+
Pkg.add("Raycore")
103+
```
104+
105+
Then check out the [interactive tutorials](https://docs.raycore.jl) to start building your first ray tracer!
106+
107+
## Future Work
108+
109+
While Raycore is production-ready for many applications, there are exciting directions for future development:
110+
111+
* **Advanced acceleration structures** - Explore alternatives to BVH like kd-trees or octrees for specific use cases
112+
* **Importance sampling** - Better Monte Carlo integration strategies for path tracing
113+
* **Spectral rendering** - Move beyond RGB to full spectral wavelengths
114+
* **Bi-directional path tracing** - Handle difficult lighting scenarios more efficiently
115+
* **GPU memory optimizations** - Reduce memory footprint for larger scenes
116+
* **Improved precompilation** - Further reduce first-run latency
117+
* **Domain-specific extensions** - Purpose-built tools for acoustic, thermal, or neutron transport
118+
119+
Contributions are welcome! The codebase is designed to be approachable, and the Julia community is friendly and helpful.
120+
121+
## Acknowledgments
122+
123+
This project builds on the excellent work of the Julia GPU ecosystem, particularly KernelAbstractions.jl for portable GPU programming, and the Julia visualization stack including Makie.jl for the interactive tutorials.
124+
125+
Special thanks to everyone who provided feedback during development and helped shape Raycore into what it is today.
126+
127+
---
128+
129+
**Links:**
130+
- [Documentation & Tutorials](https://docs.raycore.jl)
131+
- [GitHub Repository](https://github.com/yourusername/Raycore.jl)
132+
- [Julia Discourse](https://discourse.julialang.org)
133+
134+
I'm excited to see what you build with Raycore.jl. Happy ray tracing!

0 commit comments

Comments
 (0)