OptiX testrender overhaul (take two) #1897

tgrant-nv · 2024-10-30T18:17:44Z

Description

This PR is a continuation of #1829, updated to include the recently added triangle mesh support. It enables full path tracing support for the OptiX backend in testrender. We have tried to share code between the CPU and OptiX backends where practical. There is more sharing in this PR than there was in #1829, which should reduce the maintenance burden a bit.

ID-based dispatch

Virtual function calls aren't well supported in OptiX, so rather than using regular C++ polymorphism to invoke the sample(), eval(), and get_albedo() functions for each of the BSDF sub-types, we manually invoke the correct function based on the closure ID (which we have added as a member of the BSDF class).

#define BSDF_CAST(BSDF_TYPE, bsdf) reinterpret_cast<const BSDF_TYPE*>(bsdf)

OSL_HOSTDEVICE Color3
CompositeBSDF::get_albedo(const BSDF* bsdf, const Vec3& wo) const
{
    Color3 albedo(0);
    switch (bsdf->id) {
    case DIFFUSE_ID:
        albedo = BSDF_CAST(Diffuse<0>, bsdf)->get_albedo(wo);
        break;
    case TRANSPARENT_ID:
    case MX_TRANSPARENT_ID:
        albedo = BSDF_CAST(Transparent, bsdf)->get_albedo(wo);
        break;

Iterative closure evaluation

Another key change is the non-recursive closure evaluation. We apply the same style of iterative tree traversal used in the previous OptiX version of process_closure() to the shared implementations of process_closure(), evaluate_layer_opacity(), process_medium_closure(), and process_background_closure().

Background sampling

We've included support for background closures. This includes an OptiX implementation of the Background::prepare() function. We've broken that function into three phases, where phases 1 and 3 are parallelized across a warp and phase 2 is executed on a single thread. This offers a decent speedup over a single-threaded implementation without the complexity of a more sophisticated implementation.

    // from background.h
    
    template<typename F>
    OSL_HOSTDEVICE void prepare_cuda(int stride, int idx, F cb)
    {
        prepare_cuda_01(stride, idx, cb);
        if (idx == 0)
            prepare_cuda_02();
        prepare_cuda_03(stride, idx);
    }

Tests

I have enabled the render-* tests for OptiX mode. I've added alternative reference images, since the GPU output exceeds the difference threshold on many of the tests. But in most cases the difference between the CPU and GPU output is very small.

Checklist:

I have read the contribution guidelines.
I have updated the documentation, if applicable.
I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
My code follows the prevailing code style of this project. If I haven't
already run clang-format v17 before submitting, I definitely will look at
the CI test that runs clang-format and fix anything that it highlights as
being nonconforming.

Signed-off-by: Tim Grant <[email protected]>

…error with the EnergyCompensatedOrenNayar BSDF. Signed-off-by: Tim Grant <[email protected]>

…test_microfacet test case since it's no longer necessary. Signed-off-by: Tim Grant <[email protected]>

…. Tests that make texture calls are optimize-only because the OptiX osl_texture function requires handles. Signed-off-by: Tim Grant <[email protected]>

Signed-off-by: Tim Grant <[email protected]>

… intializers, since they aren't standard until C++20." Signed-off-by: Tim Grant <[email protected]>

Signed-off-by: Tim Grant <[email protected]>

fpsunflower · 2024-10-30T21:00:59Z

src/testrender/background.h

+    OSL_HOSTDEVICE void prepare_cuda(int stride, int idx, F cb)
+    {
+        prepare_cuda_01(stride, idx, cb);
+        if (idx == 0)


Maybe leave a comment here as well that this is running on a single warp? At first it wasn't clear to me how you can get away with no synchronization -- but it makes sense if there's only a single warp here.

fpsunflower · 2024-10-30T21:04:11Z

src/testrender/cuda/optix_raytracer.cu

+        trace_ray(handle, payload, V3_TO_F3(r.origin), V3_TO_F3(r.direction),
+                  tmin);
+        if (payload.hit_id == skipID1) {
+            tmin = payload.hit_t + 2.0f * epsilon;


Could nudge by bumping the integer representation instead. This would let you use a smaller epsilon.

That is a trick I was not aware of, and it appears to work. Nifty.

fpsunflower · 2024-10-30T21:07:23Z

src/testrender/cuda/rend_lib.h

 #include <OSL/oslconfig.h>

-#include <Imath/half.h>
+#if defined(__has_include) && __has_include(<Imath/half.h>)


I think we dropped support for Imath 2.x in main, so this part shouldn't be necessary (probably just a leftover from a previous merge?)

fpsunflower · 2024-10-30T21:08:24Z

src/testrender/cuda/vec_math.h

@@ -0,0 +1,97 @@
+// Copyright Contributors to the Open Shading Language project.
+// SPDX-License-Identifier: BSD-3-Clause


I thought Imath now supported cuda out of the box. Is this still needed?

Only the casting macros are needed in the current iteration. So I'll move them to where they are used and remove this file.

fpsunflower · 2024-10-30T21:11:37Z

src/testrender/optixraytracer.cpp

+{
+    if (getBackgroundShaderID() >= 0) {
+        const int bg_res = std::max<int>(32, getBackgroundResolution());
+        CUDA_CHECK(cudaMalloc(reinterpret_cast<void**>(&d_bg_values),


Would there be a way to wrap the cudaMalloc calls so that user code doesn't need to have as many reinterpret_casts everywhere ? It could also take care of calling m_ptrs_to_free.push_back() at the same time to avoid mistakes.

Ah, there is already an unused function that does almost exactly that. I'll look at adapting it to streamline these operations.

I think it would be a nice improvement, but maybe a little out-of-scope for this already big change.

I just took a stab at this and it makes the code a bit easier on the eyes. How do you feel about this?

#define DEVICE_ALLOC(size) reinterpret_cast<CUdeviceptr>(device_alloc(size)) #define COPY_TO_DEVICE(dst_device, src_host, size) \ copy_to_device(reinterpret_cast<void*>(dst_device), src_host, size)

It's a wrapper for the wrapper that takes care of some of the casting. I don't necessarily want to completely obscure the fact that we're dealing the CUdeviceptr and not void*, although they are for the most part interchangeable.

fpsunflower · 2024-10-30T21:19:32Z

src/testrender/simpleraytracer.cpp

        Sampler sampler(x, y, si);
        // jitter pixel coordinate [0,1)^2
-        Vec3 j = sampler.get();
+        Vec3 j = no_jitter ? Vec3(0, 0, 0) : sampler.get();


I think you want Vec3(0.5f, 0.5f, 0.0f) in the no_jitter case ? Otherwise you always get -1 from the warp below.

Ah, good catch.

fpsunflower · 2024-10-30T21:24:38Z

src/testrender/cuda/optix_raytracer.cu

+#include "../sampling.h"
+
+// clang-format off
+// These files must be included in this specific order


Can we make shading.cpp include shading.h and only include the .cpp here?

Yeah, this is another leftover from an earlier iteration.

…ages accordingly. Signed-off-by: Tim Grant <[email protected]>

Signed-off-by: Tim Grant <[email protected]>

…on. Update the reference images where needed. Signed-off-by: Tim Grant <[email protected]>

Signed-off-by: Tim Grant <[email protected]>

fpsunflower · 2024-11-01T06:41:29Z

src/testrender/cuda/rend_lib.h

+#include "../raytracer.h"
+
+
+#define RAYTRACER_HIT_QUAD   0


Leftover from pre-triangle version?

Yeah, it must have snuck back in during my manual rebase.

fpsunflower

Minor notes aside, this looks good to me.

Will let @lgritz take it for a spin on a machine that can run the new code since we don't have that covered by CI yet.

Signed-off-by: Tim Grant <[email protected]>

lgritz · 2024-11-10T18:28:34Z

Does this fully replace #1829? Should we close that other one to avoid confusion?

lgritz · 2024-11-10T18:33:37Z

@chellmuth and @aconty does this look reasonable to you? On an absolute scale, but also, using a set of idioms that make it a decent proxy for what we care about in a real renderer?

lgritz · 2024-11-12T23:12:11Z

@tgrant-nv This LGTM, I ran tests on my machine and came up with all sorts of failures (not your fault). The vector2/color2 tests are unrelated, I will look into that separately. But there were lots of optix tests that failed because of relatively small number of differences in the sampling noise. I see you added reference images, but even those didn't match quite right for me -- maybe different version of optix, or driver? Anyway, loosening up the thresholds did the trick. (I also changed the names of your ref images to the usual convention, a very nit-picky thing.)

So I went to push these updates on top of your branch, and it wouldn't let me, despite this very page saying "Maintainers are allowed to edit this pull request" -- I get an error "Authentication required: You must have push access to verify locks". I can do this to PRs on OIIO, but not on OSL, for reasons I don't understand.

So, could I trouble you to please take the optix-testrender-overhaul-take2 branch from my "lgritz" account (it's public) and then push that to yours, to amend this PR?

Signed-off-by: Tim Grant <[email protected]>

lgritz

This LGTM and I was able to make it run the tests correctly at work on a real GPU (modulo that I needed to push some new reference images, my results didn't quite match Tim's, but they were an obviously close enough visual match).

I'm going to merge it as it is now. We can always continue to revise it if @aconty or @chellmuth or others have further suggestions down the road, but at least that will get things unstuck -- I know that at the very least, @fpsunflower is waiting for this to go in to finalize his displacement work.

tgrant-nv added 10 commits October 30, 2024 12:04

Make the process_closure functions iterative, rather than recursive.

ee6b6a7

Signed-off-by: Tim Grant <[email protected]>

Use ID-based dispatch for get_albedo/eval/sample.

d9c845f

Signed-off-by: Tim Grant <[email protected]>

Enable pathtracing in OptiX mode.

8ace2ab

Signed-off-by: Tim Grant <[email protected]>

Add a padding field to the BSDF struct to avoid a misaligned address …

2552f8c

…error with the EnergyCompensatedOrenNayar BSDF. Signed-off-by: Tim Grant <[email protected]>

Update the reference images for the existing OptiX tests. Remove the …

5a7ade3

…test_microfacet test case since it's no longer necessary. Signed-off-by: Tim Grant <[email protected]>

Enable the render-* tests for OptiX. Add alternative reference images…

987c9f4

…. Tests that make texture calls are optimize-only because the OptiX osl_texture function requires handles. Signed-off-by: Tim Grant <[email protected]>

clang-format.

bd99c26

Signed-off-by: Tim Grant <[email protected]>

Don't need to pass the ShaderGlobals to Scene::intersect.

096fab4

Signed-off-by: Tim Grant <[email protected]>

Don't use TraceData, just use payload registers. Don't use designated…

6d10195

… intializers, since they aren't standard until C++20." Signed-off-by: Tim Grant <[email protected]>

clang-format

35eb193

Signed-off-by: Tim Grant <[email protected]>

fpsunflower mentioned this pull request Oct 30, 2024

testrender: Implement basic displacement shader support #1898

Merged

4 tasks

fpsunflower reviewed Oct 30, 2024

View reviewed changes

tgrant-nv added 6 commits October 30, 2024 18:05

Fix the pixel offset in the "no jitter" case. Adjust the reference im…

7a94afb

…ages accordingly. Signed-off-by: Tim Grant <[email protected]>

Eliminate vec_math.h.

0c42c12

Signed-off-by: Tim Grant <[email protected]>

Get rid of the shading.h include.

6a1052e

Signed-off-by: Tim Grant <[email protected]>

Add a note about the single-warp requirement in prepare_cuda().

f2df957

Signed-off-by: Tim Grant <[email protected]>

Use the integer representation to nudge tmin instead of a fixed epsil…

d43b48c

…on. Update the reference images where needed. Signed-off-by: Tim Grant <[email protected]>

Remove the unneeded half.h include.

c1acb68

Signed-off-by: Tim Grant <[email protected]>

fpsunflower reviewed Nov 1, 2024

View reviewed changes

tgrant-nv added 2 commits November 6, 2024 11:21

Remove unneeded defines for the primitive hit types.

daf816e

Signed-off-by: Tim Grant <[email protected]>

Wrap the cudaMalloc and cudaMemcpy calls.

46e9b76

Signed-off-by: Tim Grant <[email protected]>

lgritz requested a review from chellmuth November 10, 2024 18:30

lgritz requested a review from aconty November 10, 2024 18:30

tgrant-nv mentioned this pull request Nov 11, 2024

OptiX testrender overhaul #1829

Closed

4 tasks

lgritz added 2 commits November 12, 2024 17:02

Rename ref images with the usual convention

3f511f9

Signed-off-by: Tim Grant <[email protected]>

Address platform-to-platform test result variation

a5f418b

Signed-off-by: Tim Grant <[email protected]>

tgrant-nv force-pushed the optix-testrender-overhaul-take2 branch from 71b7791 to a5f418b Compare November 13, 2024 00:02

lgritz approved these changes Nov 13, 2024

View reviewed changes

lgritz merged commit bda7495 into AcademySoftwareFoundation:main Nov 13, 2024

lgritz mentioned this pull request Jan 21, 2025

test: Update ref output for render-microfacet with OptiX #1927

Merged

		@@ -0,0 +1,97 @@
		// Copyright Contributors to the Open Shading Language project.
		// SPDX-License-Identifier: BSD-3-Clause

OptiX testrender overhaul (take two) #1897

OptiX testrender overhaul (take two) #1897

Uh oh!

Conversation

tgrant-nv commented Oct 30, 2024

Description

ID-based dispatch

Iterative closure evaluation

Background sampling

Tests

Checklist:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fpsunflower Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fpsunflower left a comment

Choose a reason for hiding this comment

Uh oh!

lgritz commented Nov 10, 2024

Uh oh!

lgritz commented Nov 10, 2024

Uh oh!

lgritz commented Nov 12, 2024

Uh oh!

lgritz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fpsunflower Oct 30, 2024 •

edited

Loading