-
Notifications
You must be signed in to change notification settings - Fork 395
OptiX testrender overhaul (take two) #1897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OptiX testrender overhaul (take two) #1897
Conversation
Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
…error with the EnergyCompensatedOrenNayar BSDF. Signed-off-by: Tim Grant <[email protected]>
…test_microfacet test case since it's no longer necessary. Signed-off-by: Tim Grant <[email protected]>
…. Tests that make texture calls are optimize-only because the OptiX osl_texture function requires handles. Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
… intializers, since they aren't standard until C++20." Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
| OSL_HOSTDEVICE void prepare_cuda(int stride, int idx, F cb) | ||
| { | ||
| prepare_cuda_01(stride, idx, cb); | ||
| if (idx == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe leave a comment here as well that this is running on a single warp? At first it wasn't clear to me how you can get away with no synchronization -- but it makes sense if there's only a single warp here.
| trace_ray(handle, payload, V3_TO_F3(r.origin), V3_TO_F3(r.direction), | ||
| tmin); | ||
| if (payload.hit_id == skipID1) { | ||
| tmin = payload.hit_t + 2.0f * epsilon; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could nudge by bumping the integer representation instead. This would let you use a smaller epsilon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a trick I was not aware of, and it appears to work. Nifty.
src/testrender/cuda/rend_lib.h
Outdated
| #include <OSL/oslconfig.h> | ||
|
|
||
| #include <Imath/half.h> | ||
| #if defined(__has_include) && __has_include(<Imath/half.h>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we dropped support for Imath 2.x in main, so this part shouldn't be necessary (probably just a leftover from a previous merge?)
src/testrender/cuda/vec_math.h
Outdated
| @@ -0,0 +1,97 @@ | |||
| // Copyright Contributors to the Open Shading Language project. | |||
| // SPDX-License-Identifier: BSD-3-Clause | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought Imath now supported cuda out of the box. Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the casting macros are needed in the current iteration. So I'll move them to where they are used and remove this file.
src/testrender/optixraytracer.cpp
Outdated
| { | ||
| if (getBackgroundShaderID() >= 0) { | ||
| const int bg_res = std::max<int>(32, getBackgroundResolution()); | ||
| CUDA_CHECK(cudaMalloc(reinterpret_cast<void**>(&d_bg_values), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a way to wrap the cudaMalloc calls so that user code doesn't need to have as many reinterpret_casts everywhere ? It could also take care of calling m_ptrs_to_free.push_back() at the same time to avoid mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there is already an unused function that does almost exactly that. I'll look at adapting it to streamline these operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a nice improvement, but maybe a little out-of-scope for this already big change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took a stab at this and it makes the code a bit easier on the eyes. How do you feel about this?
#define DEVICE_ALLOC(size) reinterpret_cast<CUdeviceptr>(device_alloc(size))
#define COPY_TO_DEVICE(dst_device, src_host, size) \
copy_to_device(reinterpret_cast<void*>(dst_device), src_host, size)It's a wrapper for the wrapper that takes care of some of the casting. I don't necessarily want to completely obscure the fact that we're dealing the CUdeviceptr and not void*, although they are for the most part interchangeable.
src/testrender/simpleraytracer.cpp
Outdated
| Sampler sampler(x, y, si); | ||
| // jitter pixel coordinate [0,1)^2 | ||
| Vec3 j = sampler.get(); | ||
| Vec3 j = no_jitter ? Vec3(0, 0, 0) : sampler.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want Vec3(0.5f, 0.5f, 0.0f) in the no_jitter case ? Otherwise you always get -1 from the warp below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch.
| #include "../sampling.h" | ||
|
|
||
| // clang-format off | ||
| // These files must be included in this specific order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make shading.cpp include shading.h and only include the .cpp here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is another leftover from an earlier iteration.
…ages accordingly. Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
…on. Update the reference images where needed. Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
src/testrender/cuda/rend_lib.h
Outdated
| #include "../raytracer.h" | ||
|
|
||
|
|
||
| #define RAYTRACER_HIT_QUAD 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from pre-triangle version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it must have snuck back in during my manual rebase.
fpsunflower
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor notes aside, this looks good to me.
Will let @lgritz take it for a spin on a machine that can run the new code since we don't have that covered by CI yet.
Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
|
Does this fully replace #1829? Should we close that other one to avoid confusion? |
|
@chellmuth and @aconty does this look reasonable to you? On an absolute scale, but also, using a set of idioms that make it a decent proxy for what we care about in a real renderer? |
|
@tgrant-nv This LGTM, I ran tests on my machine and came up with all sorts of failures (not your fault). The vector2/color2 tests are unrelated, I will look into that separately. But there were lots of optix tests that failed because of relatively small number of differences in the sampling noise. I see you added reference images, but even those didn't match quite right for me -- maybe different version of optix, or driver? Anyway, loosening up the thresholds did the trick. (I also changed the names of your ref images to the usual convention, a very nit-picky thing.) So I went to push these updates on top of your branch, and it wouldn't let me, despite this very page saying "Maintainers are allowed to edit this pull request" -- I get an error "Authentication required: You must have push access to verify locks". I can do this to PRs on OIIO, but not on OSL, for reasons I don't understand. So, could I trouble you to please take the optix-testrender-overhaul-take2 branch from my "lgritz" account (it's public) and then push that to yours, to amend this PR? |
Signed-off-by: Tim Grant <[email protected]>
Signed-off-by: Tim Grant <[email protected]>
71b7791 to
a5f418b
Compare
lgritz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM and I was able to make it run the tests correctly at work on a real GPU (modulo that I needed to push some new reference images, my results didn't quite match Tim's, but they were an obviously close enough visual match).
I'm going to merge it as it is now. We can always continue to revise it if @aconty or @chellmuth or others have further suggestions down the road, but at least that will get things unstuck -- I know that at the very least, @fpsunflower is waiting for this to go in to finalize his displacement work.
Description
This PR is a continuation of #1829, updated to include the recently added triangle mesh support. It enables full path tracing support for the OptiX backend in
testrender. We have tried to share code between the CPU and OptiX backends where practical. There is more sharing in this PR than there was in #1829, which should reduce the maintenance burden a bit.ID-based dispatch
Virtual function calls aren't well supported in OptiX, so rather than using regular C++ polymorphism to invoke the
sample(),eval(), andget_albedo()functions for each of the BSDF sub-types, we manually invoke the correct function based on the closure ID (which we have added as a member of theBSDFclass).Iterative closure evaluation
Another key change is the non-recursive closure evaluation. We apply the same style of iterative tree traversal used in the previous OptiX version of
process_closure()to the shared implementations ofprocess_closure(),evaluate_layer_opacity(),process_medium_closure(), andprocess_background_closure().Background sampling
We've included support for background closures. This includes an OptiX implementation of the
Background::prepare()function. We've broken that function into three phases, where phases 1 and 3 are parallelized across a warp and phase 2 is executed on a single thread. This offers a decent speedup over a single-threaded implementation without the complexity of a more sophisticated implementation.Tests
I have enabled the
render-*tests for OptiX mode. I've added alternative reference images, since the GPU output exceeds the difference threshold on many of the tests. But in most cases the difference between the CPU and GPU output is very small.Checklist:
already run clang-format v17 before submitting, I definitely will look at
the CI test that runs clang-format and fix anything that it highlights as
being nonconforming.