diff --git a/README.md b/README.md index f044c821..dc873c1a 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,77 @@ CUDA Denoiser For CUDA Path Tracer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Dongying Liu +* [LinkedIn](https://www.linkedin.com/in/dongying-liu/), [personal website](https://vivienliu1998.wixsite.com/portfolio) +* Tested on: Windows 11, i7-11700 @ 2.50GHz, NVIDIA GeForce RTX 3060 -### (TODO: Your README) +# Project Description -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +For this project, I implemented an denoiser into my path-tracer based on the [Edge-Avoiding A-Trous Wavelet Transform for Fast Global Illumination Filtering](https://jo.dreggn.org/home/2010_atrous.pdf) +Pathtraced result are often noisy or grainy. And the noise reduction usually doesn't scale linearly with more iterations. So, the goal for this project is to implement a denoisor to decrease the time to render a nice result. Bacially, the idea is to blur the image with the edge preserving. + +I started with implemented a simple Gaussian Blur. With a 5*5 kernel, for every pixel, I added the weighted result of the 24 neighbors of the it so the final picture is blurry. + +After implemented the Gaussian Blur, I slightly change the code and make it into A-Trous Wavelet transform. The idea of A-Trous is to approximate gaussian by iteratively applying sparese blurs of increasing size. The 5x5 kernel is still used as the filter, but for each iteration, accrording to the given filter size, said 16x16, we will space out the samples according to the offset three times(iterations) and added the weighted result to create the blurry result. + +Then, I added a edge preserving to the A-Trous Wavelet Transform. With the help of per-pixel color, normal and position data to preserve edges when blurring according to the paper. + +//// gbuffer +//// three result ps gaussian, my a-trous, my a-trous with edge preserving +|Photoshop Gaussian | My Gaussian | A-Trous | A-Trous with Edge Preserving | +|-------------|-------------|-------------|-------------| +||||| + +After testing the denoisor with my custom scene, I found the time is dramatically decreased. It took me hours to render the scene with 3739 iterations to create a not bad result. But it only took me about 3 minutes to render the scene with 39 iterations ot create a smooth image. + +| Original 3739 iterations| Denoised 39 iterations| +|-------------|-------------| +| || + +# Performance Analysis + +## Time added by Denoiser +The chart shows the time added when denoiser is on. There's about 5-10ms added when using denoiser. + + + +## Iteration for smooth result decreased by denoiser +However, the smooth result denoiser added to the final image save a lot more time than using more iterations to create the smooth result. +The right image without denoise used 500 iterations. The left image with denoise only used 10 iterations. +| Original 500 iterations| Denoised 10 iterations| +|-------------|-------------| +| || + +## Denoiser under different resolutions +With filter size and kernel size remained the same and resolution of the image increased, as the chart shows, denoiser took more time to create a smooth result. This make sense because there are more pixels for denoiser to go over even thought the filter size and kernel size is not changing. + + + +## Filter size and denoise time +With kernel size remained the same, as the chart shows, denoiser took more time to create a smooth result. This make sense because when filter size increase, the iteration of the A-Trous increase, it will take more times for more iterations. + + + + +## Filter size and denoise visual result +Because of the calculation of the iteration base on the filtersize, the visual quality does not scale uniformly with filter size. Below is the filter size I have been testing with. Some of the filter size actually have the same iteration, so there are no obvious different between them. + +| Filtersize | Iterations | Result | Filtersize | Iterations | Result | +|-| ------------- | -------- |-| ------------- | -------- | +| 16 | 1 | | 30 | 2 | | +| 50 | 3 | | 80 | 4 | | +| 200 | 5 | | + +## Denoiser and materials +In my custom scene, the deer is glass which is refractive and the back wall is mirror which is reflective, others are all diffuse material. And we can tell from it that denoiser works well with diffuse material because the surface is mostly one color. However, denoised result looks not good for reflective and refreactive material, since they all looks blurry after denoise. These two materials create reflection and refraction of the surrounding scene, the edge of the reflected/refracted object need to be clear to make the material credible. +

+ +

+ +## Denoiser and lighting conditions +With the same kernel size and filter size, the cornell.txt scene needed about 100 iterations to create a smooth result while the cornell_ceiling_light.txt only needed 10 iterations. cornell_ceiling_light.txt produce better denoised result. I think the reason is this secene has a bigger ligth, so more rays will end with the light and contribute to the final result, so the image will convert faster. When the origin image quality is better, I think the denoised result will be better. + +| Large Light 10 iterations| Small Light 100 iterations| +|-------------|-------------| +| || diff --git a/img/10iterResultDenoise.jpg b/img/10iterResultDenoise.jpg new file mode 100644 index 00000000..a780c998 Binary files /dev/null and b/img/10iterResultDenoise.jpg differ diff --git a/img/500iterResultNoDonoise.jpg b/img/500iterResultNoDonoise.jpg new file mode 100644 index 00000000..e1ee7b4f Binary files /dev/null and b/img/500iterResultNoDonoise.jpg differ diff --git a/img/atrous.jpg b/img/atrous.jpg new file mode 100644 index 00000000..f24f9730 Binary files /dev/null and b/img/atrous.jpg differ diff --git a/img/atrousScene.jpg b/img/atrousScene.jpg new file mode 100644 index 00000000..10e7e43e Binary files /dev/null and b/img/atrousScene.jpg differ diff --git a/img/blooper1.jpg b/img/blooper1.jpg new file mode 100644 index 00000000..249a3d3b Binary files /dev/null and b/img/blooper1.jpg differ diff --git a/img/blooper2.jpg b/img/blooper2.jpg new file mode 100644 index 00000000..d7a773e6 Binary files /dev/null and b/img/blooper2.jpg differ diff --git a/img/color.jpg b/img/color.jpg new file mode 100644 index 00000000..edc10c3e Binary files /dev/null and b/img/color.jpg differ diff --git a/img/customSceneDenoised.jpg b/img/customSceneDenoised.jpg new file mode 100644 index 00000000..63713afb Binary files /dev/null and b/img/customSceneDenoised.jpg differ diff --git a/img/customScene_3739samp.png b/img/customScene_3739samp.png new file mode 100644 index 00000000..8b7fb6d8 Binary files /dev/null and b/img/customScene_3739samp.png differ diff --git a/img/filtersize100.jpg b/img/filtersize100.jpg new file mode 100644 index 00000000..029febe5 Binary files /dev/null and b/img/filtersize100.jpg differ diff --git a/img/filtersize16.jpg b/img/filtersize16.jpg new file mode 100644 index 00000000..b1d62461 Binary files /dev/null and b/img/filtersize16.jpg differ diff --git a/img/filtersize200.jpg b/img/filtersize200.jpg new file mode 100644 index 00000000..a3034c6b Binary files /dev/null and b/img/filtersize200.jpg differ diff --git a/img/filtersize30.jpg b/img/filtersize30.jpg new file mode 100644 index 00000000..c8decc9c Binary files /dev/null and b/img/filtersize30.jpg differ diff --git a/img/filtersize50.jpg b/img/filtersize50.jpg new file mode 100644 index 00000000..4ad2bc36 Binary files /dev/null and b/img/filtersize50.jpg differ diff --git a/img/filtersize60.jpg b/img/filtersize60.jpg new file mode 100644 index 00000000..f8c6efa0 Binary files /dev/null and b/img/filtersize60.jpg differ diff --git a/img/filtersize70.jpg b/img/filtersize70.jpg new file mode 100644 index 00000000..804498d9 Binary files /dev/null and b/img/filtersize70.jpg differ diff --git a/img/filtersize700.jpg b/img/filtersize700.jpg new file mode 100644 index 00000000..76ed32aa Binary files /dev/null and b/img/filtersize700.jpg differ diff --git a/img/filtersize80.jpg b/img/filtersize80.jpg new file mode 100644 index 00000000..0647069d Binary files /dev/null and b/img/filtersize80.jpg differ diff --git a/img/gaussian.jpg b/img/gaussian.jpg new file mode 100644 index 00000000..e4a2c787 Binary files /dev/null and b/img/gaussian.jpg differ diff --git a/img/gaussianPS.png b/img/gaussianPS.png new file mode 100644 index 00000000..c8383b8c Binary files /dev/null and b/img/gaussianPS.png differ diff --git a/img/lightcondition10.jpg b/img/lightcondition10.jpg new file mode 100644 index 00000000..f9976a6b Binary files /dev/null and b/img/lightcondition10.jpg differ diff --git a/img/lightcondition100.jpg b/img/lightcondition100.jpg new file mode 100644 index 00000000..e17639e4 Binary files /dev/null and b/img/lightcondition100.jpg differ diff --git a/img/normal.jpg b/img/normal.jpg new file mode 100644 index 00000000..1723075e Binary files /dev/null and b/img/normal.jpg differ diff --git a/img/timeAddedIter.png b/img/timeAddedIter.png new file mode 100644 index 00000000..79d3cf80 Binary files /dev/null and b/img/timeAddedIter.png differ diff --git a/img/timeFilterSize.png b/img/timeFilterSize.png new file mode 100644 index 00000000..5093b55f Binary files /dev/null and b/img/timeFilterSize.png differ diff --git a/img/timeResolution.png b/img/timeResolution.png new file mode 100644 index 00000000..f20844aa Binary files /dev/null and b/img/timeResolution.png differ diff --git a/img/uniformBlue_80_iterations.jpg b/img/uniformBlue_80_iterations.jpg new file mode 100644 index 00000000..5e1b1e02 Binary files /dev/null and b/img/uniformBlue_80_iterations.jpg differ diff --git a/img/uniformBlur_16_iterations.jpg b/img/uniformBlur_16_iterations.jpg new file mode 100644 index 00000000..e6d966c8 Binary files /dev/null and b/img/uniformBlur_16_iterations.jpg differ diff --git a/scenes/cornell.txt b/scenes/cornell.txt index 83ff8202..ca356d47 100644 --- a/scenes/cornell.txt +++ b/scenes/cornell.txt @@ -52,7 +52,7 @@ EMITTANCE 0 CAMERA RES 800 800 FOVY 45 -ITERATIONS 5000 +ITERATIONS 100 DEPTH 8 FILE cornell EYE 0.0 5 10.5 diff --git a/src/main.cpp b/src/main.cpp index 4092ae4a..0a0bf7d9 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -167,7 +167,13 @@ void runCuda() { if (ui_showGbuffer) { showGBuffer(pbo_dptr); - } else { + } + else if (ui_denoise) { + denoiseImage( + ui_filterSize, ui_colorWeight, ui_normalWeight, ui_positionWeight, + pbo_dptr, iteration); + } + else { showImage(pbo_dptr, iteration); } diff --git a/src/pathtrace.cu b/src/pathtrace.cu index 23e5f909..cd9a558f 100644 --- a/src/pathtrace.cu +++ b/src/pathtrace.cu @@ -16,6 +16,11 @@ #define ERRORCHECK 1 +#define SHOW_POS_GBUFFER 1 +#define SHOW_NOR_GBUFFER 0 + +#define EDGE_VOIDING 1 + #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__) #define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__) void checkCUDAErrorFn(const char *msg, const char *file, int line) { @@ -38,6 +43,14 @@ void checkCUDAErrorFn(const char *msg, const char *file, int line) { #endif } +const float kernel[25] = { + 0.003765, 0.015019, 0.023792, 0.015019, 0.003765, + 0.015019, 0.059912, 0.094907, 0.059912, 0.015019, + 0.023792, 0.094907, 0.150342, 0.094907, 0.023792, + 0.015019, 0.059912, 0.094907, 0.059912, 0.015019, + 0.003765, 0.015019, 0.023792, 0.015019, 0.003765, +}; + __host__ __device__ thrust::default_random_engine makeSeededRandomEngine(int iter, int index, int depth) { int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index); @@ -67,18 +80,51 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution, } } +__global__ void sendDenoisedImageToPBO(uchar4* pbo, glm::ivec2 resolution, glm::vec3* denoised) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < resolution.x && y < resolution.y) { + int idx = x + resolution.x * y; + glm::vec3 pixel = denoised[idx]; + + pbo[idx].w = 0; + pbo[idx].x = glm::clamp((int)(pixel.x * 255.0), 0, 255); + pbo[idx].y = glm::clamp((int)(pixel.y * 255.0), 0, 255); + pbo[idx].z = glm::clamp((int)(pixel.z * 255.0), 0, 255); + } +} + + __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; if (x < resolution.x && y < resolution.y) { int index = x + (y * resolution.x); - float timeToIntersect = gBuffer[index].t * 256.0; - pbo[index].w = 0; - pbo[index].x = timeToIntersect; - pbo[index].y = timeToIntersect; - pbo[index].z = timeToIntersect; + if (SHOW_POS_GBUFFER) { + glm::vec3 pos = glm::clamp(glm::abs(gBuffer[index].pos * 20.0f), 0.0f, 255.0f); + pbo[index].w = 0; + pbo[index].x = pos.x; + pbo[index].y = pos.y; + pbo[index].z = pos.z; + } + else if (SHOW_NOR_GBUFFER) { + glm::vec3 nor = glm::clamp(glm::abs(gBuffer[index].nor * 255.0f), 0.0f, 255.0f); + pbo[index].w = 0; + pbo[index].x = nor.x; + pbo[index].y = nor.y; + pbo[index].z = nor.z; + } + else { + float timeToIntersect = gBuffer[index].t * 256.0; + pbo[index].w = 0; + pbo[index].x = timeToIntersect; + pbo[index].y = timeToIntersect; + pbo[index].z = timeToIntersect; + } } } @@ -90,7 +136,9 @@ static PathSegment * dev_paths = NULL; static ShadeableIntersection * dev_intersections = NULL; static GBufferPixel* dev_gBuffer = NULL; // TODO: static variables for device memory, any extra info you need, etc -// ... +static float* dev_kernel = NULL; +static glm::vec3* dev_denoise1 = NULL; +static glm::vec3* dev_denoise2 = NULL; void pathtraceInit(Scene *scene) { hst_scene = scene; @@ -114,6 +162,14 @@ void pathtraceInit(Scene *scene) { cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel)); // TODO: initialize any extra device memeory you need + cudaMalloc(&dev_kernel, 25 * sizeof(float)); + cudaMemcpy(dev_kernel, &kernel[0], 25 * sizeof(float), cudaMemcpyHostToDevice); + + cudaMalloc(&dev_denoise1, pixelcount * sizeof(glm::vec3)); + cudaMemset(dev_denoise1, 0, pixelcount * sizeof(glm::vec3)); + + cudaMalloc(&dev_denoise2, pixelcount * sizeof(glm::vec3)); + cudaMemset(dev_denoise2, 0, pixelcount * sizeof(glm::vec3)); checkCUDAError("pathtraceInit"); } @@ -126,7 +182,9 @@ void pathtraceFree() { cudaFree(dev_intersections); cudaFree(dev_gBuffer); // TODO: clean up any extra device memory you created - + cudaFree(dev_kernel); + cudaFree(dev_denoise1); + cudaFree(dev_denoise2); checkCUDAError("pathtraceFree"); } @@ -274,15 +332,23 @@ __global__ void shadeSimpleMaterials ( } __global__ void generateGBuffer ( - int num_paths, - ShadeableIntersection* shadeableIntersections, + int num_paths, + ShadeableIntersection* shadeableIntersections, PathSegment* pathSegments, - GBufferPixel* gBuffer) { - int idx = blockIdx.x * blockDim.x + threadIdx.x; - if (idx < num_paths) - { - gBuffer[idx].t = shadeableIntersections[idx].t; - } + GBufferPixel* gBuffer) { + int idx = blockIdx.x * blockDim.x + threadIdx.x; + if (idx < num_paths) + { + ShadeableIntersection shadeableIntersection = shadeableIntersections[idx]; + Ray ray = pathSegments[idx].ray; + gBuffer[idx].t = shadeableIntersection.t; + gBuffer[idx].pos = glm::vec3(0.f); + gBuffer[idx].nor = glm::vec3(0.f); + if (shadeableIntersection.t != -1.f) { + gBuffer[idx].pos = ray.origin + shadeableIntersection.t * ray.direction; + gBuffer[idx].nor = shadeableIntersection.surfaceNormal; + } + } } // Add the current iteration's output to the overall image @@ -297,6 +363,113 @@ __global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterati } } +__global__ void aTrousDenoise( + Camera cam, int stepSize, float* kernel, + glm::vec3* denoised1, glm::vec3* denoised2, int iter) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + int idx = x + cam.resolution.x * y; + + if (x < cam.resolution.x && y < cam.resolution.y) { + // get the origin pixel color, position color, normal color -> denote as q in the paper + glm::vec3 colQ = denoised1[idx]; + if (stepSize == 1) { + colQ = colQ / (float)iter; + } + + glm::vec3 finalColor = glm::vec3(0.f); + float k = 0.f; + + // do the 5 * 5 kernel + for (int i = -2; i <= 2; i++) { + for (int j = -2; j <= 2; j++) { + int newX = x + i * stepSize; + int newY = y + j * stepSize; + int newIdx = newX + cam.resolution.x * newY; + + if (newIdx < cam.resolution.x * cam.resolution.y && newIdx >= 0) { + // get the kernel value + int hIdx = (i + 2) + (j + 2) * 5; + float h = kernel[hIdx]; + // get the current pixel color, position color, normal color -> denote as p in the paper + glm::vec3 colP = denoised1[newIdx]; + if (stepSize == 1) { + colP = denoised1[newIdx] / (float)iter; + } + + finalColor += h * colP; + } + } + } + denoised2[idx] = finalColor; + } +} + +__global__ void aTrousDenoiseWithEdgeStopping( + Camera cam, int stepSize, float* kernel, glm::vec3* denoised1, glm::vec3* denoised2, + float colPhi, float norPhi, float posPhi, GBufferPixel* gbufferPixel, int iter) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + int idx = x + cam.resolution.x * y; + + if (x < cam.resolution.x && y < cam.resolution.y) { + // get the origin pixel color, position color, normal color -> denote as q in the paper + glm::vec3 colQ = denoised1[idx]; + if (stepSize == 1) { + colQ = colQ / (float)iter; + } + glm::vec3 norQ = gbufferPixel[idx].nor; + glm::vec3 posQ = gbufferPixel[idx].pos; + + glm::vec3 finalColor = glm::vec3(0.f); + float k = 0.f; + + // do the 5 * 5 kernel + for (int i = -2; i <= 2; i++) { + for (int j = -2; j <= 2; j++) { + int newX = x + i * stepSize; + int newY = y + j * stepSize; + ////int newX = glm::clamp(x + i * stepSize, 0, cam.resolution.x - 1); + ////int newY = glm::clamp(y + j * stepSize, 0, cam.resolution.y - 1); + int newIdx = newX + cam.resolution.x * newY; + + if (newIdx < cam.resolution.x * cam.resolution.y && newIdx >= 0) { + // get the kernel value + int hIdx = (i + 2) + (j + 2) * 5; + float h = kernel[hIdx]; + // get the current pixel color, position color, normal color -> denote as p in the paper + glm::vec3 colP = denoised1[newIdx]; + if (stepSize == 1) { + colP = denoised1[newIdx] / (float)iter; + } + glm::vec3 norP = gbufferPixel[newIdx].nor; + glm::vec3 posP = gbufferPixel[newIdx].pos; + + // calculate weight + float colDist = glm::dot(colP - colQ, colP - colQ); + float colWeight = glm::min(glm::exp(-(colDist) / colPhi), 1.f); + + //float norDist = glm::max(glm::dot(norP - norQ, norP - norQ) / (stepSize * stepSize), 0.f); + float norDist = glm::dot(norP - norQ, norP - norQ); + float norWeight = glm::min(glm::exp(-(norDist) / norPhi), 1.f); + + float posDist = glm::dot(posP - posQ, posP - posQ); + float posWeight = glm::min(glm::exp(-(posDist) / posPhi), 1.f); + + float weight = colWeight * norWeight * posWeight; + k += h * weight; + finalColor += h * weight * colP; + } + } + } + denoised2[idx] = finalColor / k; + } +} + /** * Wrapper for the __global__ call that sets up the kernel calls and does a ton * of memory management @@ -431,3 +604,44 @@ const Camera &cam = hst_scene->state.camera; // Send results to OpenGL buffer for rendering sendImageToPBO<<>>(pbo, cam.resolution, iter, dev_image); } + +void denoiseImage( + float filterSize, float colPhi, float norPhi, float posPhi, + uchar4* pbo, int iter // for show image +) +{ + const Camera &cam = hst_scene->state.camera; + const dim3 blockSize2d(8, 8); + const dim3 blocksPerGrid2d( + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + const int pixelcount = cam.resolution.x * cam.resolution.y; + + // prepare the dev_denoise1 buffer with the dev_image + cudaMemcpy(dev_denoise1, dev_image, pixelcount * sizeof(glm::vec3), cudaMemcpyDeviceToDevice); + + // do filter iteration + int iterTimes = filterSize < 5 ? 1 : floor(log2(filterSize / 5.f)); + for (int i = 0; i < 1; ++i) { + int stepSize = pow(2, i); + + aTrousDenoise << > > ( + cam, stepSize, dev_kernel, dev_denoise1, dev_denoise2, iter); + checkCUDAError("a trous denoise"); + + //aTrousDenoiseWithEdgeStopping << > > ( + // cam, stepSize, dev_kernel, dev_denoise1, dev_denoise2, + // colPhi, norPhi, posPhi, dev_gBuffer, iter); + //checkCUDAError("a trous denoise with edge stopping"); + + glm::vec3* tmp = dev_denoise1; + dev_denoise1 = dev_denoise2; + dev_denoise2 = tmp; + } + + // show denoiseImage + // Send results to OpenGL buffer for rendering + //sendImageToPBO << > > (pbo, cam.resolution, iter, dev_denoise1); + sendDenoisedImageToPBO << > > (pbo, cam.resolution, dev_denoise1); +} diff --git a/src/pathtrace.h b/src/pathtrace.h index 9e12f440..a0059081 100644 --- a/src/pathtrace.h +++ b/src/pathtrace.h @@ -8,3 +8,4 @@ void pathtraceFree(); void pathtrace(int frame, int iteration); void showGBuffer(uchar4 *pbo); void showImage(uchar4 *pbo, int iter); +void denoiseImage(float filterSize, float colPhi, float norPhi, float posPhi, uchar4* pbo, int iter); diff --git a/src/sceneStructs.h b/src/sceneStructs.h index da7e558a..bec69824 100644 --- a/src/sceneStructs.h +++ b/src/sceneStructs.h @@ -79,4 +79,6 @@ struct ShadeableIntersection { // What information might be helpful for guiding a denoising filter? struct GBufferPixel { float t; + glm::vec3 pos; // save per-pixel position info + glm::vec3 nor; // save per-pixel normal info };