diff --git a/README.md b/README.md
index f044c821..15843e77 100644
--- a/README.md
+++ b/README.md
@@ -3,11 +3,51 @@ CUDA Denoiser For CUDA Path Tracer
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Ryan Tong
+  * [LinkedIn](https://www.linkedin.com/in/ryanctong/), [personal website](), [twitter](), etc.
+* Tested on: Windows 10, i7-8750H @ 2.20GHz 16GB, GeForce GTX 1060 6144MB (Personal Laptop)
 
-### (TODO: Your README)
+![Denoiser](img/title.png)
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+### Project Description
+This project uses an A-Trous filter to denoise a pathtraced image at an early iteration stage to reduce the number of iterations needed to generate an acceptably smooth image. The paper followed was: https://jo.dreggn.org/home/2010_atrous.pdf
+
+### Performance
+To generate an “acceptably smooth” image as shown above, I found that the denoiser takes about ~13ms. Note that this image was generated using 800x800 resolution and filter size of 40.
+
+Compared to using many thousands of iterations to generate an acceptably smooth image with only 10 iterations with denoising as opposed to 500 iterations without denoising.
+
+### Denoised vs Not
+![Denoised](img/title.png)
+![Reference](img/reference.png)
+
+### Performance Analysis
+2 parameters that affect runtime of the denoiser are resolution and filter size. For resolution, this makes sense because resolution determines the total number of operations needed to be performed. Specifically, the higher the resolution, the more threads need to be launched since one denoising thread is launched per pixel. Similarly, increasing filter size increases runtime because it increases the number of iterations of increasing the filter size. Specifically, we start at a size of 5x5 and increase the step width by a factor of 2 until we reach the desired filter size. We can see these affects reflected in the data below:
+
+### Resolution Performance Impact
+![Denoiser](img/resolution.png)
+
+### Filter Size Performance Impact
+![Denoiser](img/filter.png)
+
+### Filter Size Visual Analysis
+Here are some images comparing the effects of filter size. As you can see, filter values that are too small are ineffective and filter sizes that are too big make the image too blurry. This makes sense because smaller filters do not take into account enough of the neighboring pixels to be effective and larger filters take into account too many neighboring pixels. 
+### Different Filter Size Visualization
+![5x5](img/five.png)
+![15x15](img/fifteen.png)
+![45x45](img/fourtyfive.png)
+![80x80](img/eighty.png)
+![100x100](img/hundred.png)
+
+### Material Type
+The material type also determines the effectiveness of this technique. Diffuse materials work best while specular is worse since the reflections are blurred as well.
+### Different Material Visualization
+![diffuse](img/diffuse.png)
+![specular](img/title.png)
+
+### Scene Type
+The amount of light also determines the effectiveness of this technique. Specifically, brighter scenes work better since there is less complexity in the lighting and more uniformity. As you can see in the darker Cornell box scene, there are more dark splotches that are due to the fact that there is a large change from light to dark that the filter is unable to smooth.
+### Different Lighting Visualization
+![dark](img/dark.png)
+![light](img/title.png)
 
diff --git a/img/dark.png b/img/dark.png
new file mode 100644
index 00000000..060cf78f
Binary files /dev/null and b/img/dark.png differ
diff --git a/img/diffuse.png b/img/diffuse.png
new file mode 100644
index 00000000..e1204588
Binary files /dev/null and b/img/diffuse.png differ
diff --git a/img/eighty.png b/img/eighty.png
new file mode 100644
index 00000000..ba39a33f
Binary files /dev/null and b/img/eighty.png differ
diff --git a/img/fifteen.png b/img/fifteen.png
new file mode 100644
index 00000000..4a7ee7e3
Binary files /dev/null and b/img/fifteen.png differ
diff --git a/img/filter.png b/img/filter.png
new file mode 100644
index 00000000..7325f05f
Binary files /dev/null and b/img/filter.png differ
diff --git a/img/five.png b/img/five.png
new file mode 100644
index 00000000..c4bbba56
Binary files /dev/null and b/img/five.png differ
diff --git a/img/fourtyfive.png b/img/fourtyfive.png
new file mode 100644
index 00000000..4fe04ab8
Binary files /dev/null and b/img/fourtyfive.png differ
diff --git a/img/hundred.png b/img/hundred.png
new file mode 100644
index 00000000..5d657edc
Binary files /dev/null and b/img/hundred.png differ
diff --git a/img/reference.png b/img/reference.png
new file mode 100644
index 00000000..845535b9
Binary files /dev/null and b/img/reference.png differ
diff --git a/img/resolution.png b/img/resolution.png
new file mode 100644
index 00000000..226494c9
Binary files /dev/null and b/img/resolution.png differ
diff --git a/img/title.png b/img/title.png
new file mode 100644
index 00000000..6f3151b1
Binary files /dev/null and b/img/title.png differ
diff --git a/scenes/cornell.txt b/scenes/cornell.txt
index 83ff8202..77ad5512 100644
--- a/scenes/cornell.txt
+++ b/scenes/cornell.txt
@@ -52,7 +52,7 @@ EMITTANCE   0
 CAMERA
 RES         800 800
 FOVY        45
-ITERATIONS  5000
+ITERATIONS  10
 DEPTH       8
 FILE        cornell
 EYE         0.0 5 10.5
diff --git a/src/main.cpp b/src/main.cpp
index 4092ae4a..dbc77abf 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -167,7 +167,11 @@ void runCuda() {
 
     if (ui_showGbuffer) {
       showGBuffer(pbo_dptr);
-    } else {
+    }
+    else if (ui_denoise) {
+        showDenoise(pbo_dptr, iteration, ui_filterSize, ui_colorWeight, ui_normalWeight, ui_positionWeight);
+    } 
+    else {
       showImage(pbo_dptr, iteration);
     }
 
diff --git a/src/pathtrace.cu b/src/pathtrace.cu
index 23e5f909..9df6d8f7 100644
--- a/src/pathtrace.cu
+++ b/src/pathtrace.cu
@@ -67,18 +67,23 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution,
     }
 }
 
+// TODO Modify this so that we can viz different parts of the gbuffer
 __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) {
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
 
     if (x < resolution.x && y < resolution.y) {
         int index = x + (y * resolution.x);
-        float timeToIntersect = gBuffer[index].t * 256.0;
-
-        pbo[index].w = 0;
-        pbo[index].x = timeToIntersect;
-        pbo[index].y = timeToIntersect;
-        pbo[index].z = timeToIntersect;
+        //float timeToIntersect = gBuffer[index].t * 256.0;
+        //glm::vec3 viz = (gBuffer[index].normal + glm::vec3(1.0)) / glm::vec3(2.0) * glm::vec3(255.0);
+        if (gBuffer[index].t > 0) {
+            float position_range = 25.f;
+            glm::vec3 viz = (glm::clamp(gBuffer[index].position, glm::vec3(-position_range), glm::vec3(position_range)) + position_range) / (position_range * 2.f) * 255.f;
+            pbo[index].w = 0;
+            pbo[index].x = viz.r;
+            pbo[index].y = viz.g;
+            pbo[index].z = viz.b;
+        }
     }
 }
 
@@ -92,6 +97,29 @@ static GBufferPixel* dev_gBuffer = NULL;
 // TODO: static variables for device memory, any extra info you need, etc
 // ...
 
+static float* dev_filter = NULL;
+static glm::vec2* dev_offsets = NULL;
+// Kernel/Filter from https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html
+const float filter[25] = { 1.0 / 256.0, 1.0 / 64.0,  3.0 / 128.0, 1.0 / 64.0, 1.0 / 256.0,
+                           1.0 / 64.0,  1.0 / 16.0,  3.0 / 32.0,  1.0 / 16.0, 1.0 / 64.0,
+                           3.0 / 128.0, 3.0 / 32.0,  9.0 / 64.0,  3.0 / 32.0, 3.0 / 128.0,
+                           1.0 / 64.0,  1.0 / 16.0,  3.0 / 32.0,  1.0 / 16.0, 1.0 / 64.0,
+                           1.0 / 256.0, 1.0 / 64.0,  3.0 / 128.0, 1.0 / 64.0, 1.0 / 256.0,
+};
+// Offsets (x, y)
+const glm::vec2 offsets[25] = { glm::vec2(-2, -2), glm::vec2(-1, -2), glm::vec2(0, -2), glm::vec2(1, -2), glm::vec2(2, -2),
+                                glm::vec2(-2, -1), glm::vec2(-1, -1), glm::vec2(0, -1), glm::vec2(1, -1), glm::vec2(2, -1),
+                                glm::vec2(-2, 0),  glm::vec2(-1, 0),  glm::vec2(0, 0),  glm::vec2(1, 0),  glm::vec2(2, 0),
+                                glm::vec2(-2, 1),  glm::vec2(-1, 1),  glm::vec2(0, 1),  glm::vec2(1, 1),  glm::vec2(2, 1),
+                                glm::vec2(-2, 2),  glm::vec2(-1, 2),  glm::vec2(0, 2),  glm::vec2(1, 2),  glm::vec2(2, 2),
+};
+// Temp denoise output buffer for ping ponging
+static glm::vec3* dev_denoise_in = NULL;
+static glm::vec3* dev_denoise_out = NULL;
+// Stuff for timing
+static cudaEvent_t startTime = NULL;
+static cudaEvent_t endTime = NULL; 
+
 void pathtraceInit(Scene *scene) {
     hst_scene = scene;
     const Camera &cam = hst_scene->state.camera;
@@ -114,7 +142,17 @@ void pathtraceInit(Scene *scene) {
     cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel));
 
     // TODO: initialize any extra device memeory you need
+    cudaMalloc(&dev_filter, 25 * sizeof(float));
+    cudaMemcpy(dev_filter, &filter, 25 * sizeof(float), cudaMemcpyHostToDevice);
+
+    cudaMalloc(&dev_offsets, 25 * sizeof(glm::vec2));
+    cudaMemcpy(dev_offsets, &offsets, 25 * sizeof(glm::vec2), cudaMemcpyHostToDevice);
 
+    cudaMalloc(&dev_denoise_in, pixelcount * sizeof(glm::vec3));
+    cudaMalloc(&dev_denoise_out, pixelcount * sizeof(glm::vec3));
+
+    cudaEventCreate(&startTime);
+    cudaEventCreate(&endTime);
     checkCUDAError("pathtraceInit");
 }
 
@@ -126,7 +164,17 @@ void pathtraceFree() {
   	cudaFree(dev_intersections);
     cudaFree(dev_gBuffer);
     // TODO: clean up any extra device memory you created
+    cudaFree(dev_filter);
+    cudaFree(dev_offsets);
+    cudaFree(dev_denoise_in);
+    cudaFree(dev_denoise_out);
 
+    if (startTime != NULL) {
+        cudaEventDestroy(startTime);
+    }
+    if (endTime != NULL) {
+        cudaEventDestroy(endTime);
+    }
     checkCUDAError("pathtraceFree");
 }
 
@@ -148,7 +196,7 @@ __global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, Path
 		PathSegment & segment = pathSegments[index];
 
 		segment.ray.origin = cam.position;
-    segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
+        segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
 
 		segment.ray.direction = glm::normalize(cam.view
 			- cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f)
@@ -273,15 +321,19 @@ __global__ void shadeSimpleMaterials (
   }
 }
 
+
+// TODO ADD NORMALS, XYZ to this
 __global__ void generateGBuffer (
   int num_paths,
   ShadeableIntersection* shadeableIntersections,
-	PathSegment* pathSegments,
+  PathSegment* pathSegments,
   GBufferPixel* gBuffer) {
   int idx = blockIdx.x * blockDim.x + threadIdx.x;
   if (idx < num_paths)
   {
     gBuffer[idx].t = shadeableIntersections[idx].t;
+    gBuffer[idx].normal = shadeableIntersections[idx].surfaceNormal;
+    gBuffer[idx].position = shadeableIntersections[idx].t * pathSegments[idx].ray.direction + pathSegments[idx].ray.origin;
   }
 }
 
@@ -356,46 +408,46 @@ void pathtrace(int frame, int iter) {
 	// --- PathSegment Tracing Stage ---
 	// Shoot ray into scene, bounce between objects, push shading chunks
 
-  // Empty gbuffer
-  cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel));
+    // Empty gbuffer
+    cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel));
 
 	// clean shading chunks
 	cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
 
-  bool iterationComplete = false;
+    bool iterationComplete = false;
 	while (!iterationComplete) {
 
-	// tracing
-	dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
-	computeIntersections <<<numblocksPathSegmentTracing, blockSize1d>>> (
-		depth
-		, num_paths
-		, dev_paths
-		, dev_geoms
-		, hst_scene->geoms.size()
-		, dev_intersections
-		);
-	checkCUDAError("trace one bounce");
-	cudaDeviceSynchronize();
-
-  if (depth == 0) {
-    generateGBuffer<<<numblocksPathSegmentTracing, blockSize1d>>>(num_paths, dev_intersections, dev_paths, dev_gBuffer);
-  }
-
-	depth++;
-
-  shadeSimpleMaterials<<<numblocksPathSegmentTracing, blockSize1d>>> (
-    iter,
-    num_paths,
-    dev_intersections,
-    dev_paths,
-    dev_materials
-  );
-  iterationComplete = depth == traceDepth;
+	    // tracing
+	    dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
+	    computeIntersections <<<numblocksPathSegmentTracing, blockSize1d>>> (
+		    depth
+		    , num_paths
+		    , dev_paths
+		    , dev_geoms
+		    , hst_scene->geoms.size()
+		    , dev_intersections
+		    );
+	    checkCUDAError("trace one bounce");
+	    cudaDeviceSynchronize();
+
+        if (depth == 0) {
+            generateGBuffer<<<numblocksPathSegmentTracing, blockSize1d>>>(num_paths, dev_intersections, dev_paths, dev_gBuffer);
+        }
+
+	    depth++;
+
+        shadeSimpleMaterials<<<numblocksPathSegmentTracing, blockSize1d>>> (
+        iter,
+        num_paths,
+        dev_intersections,
+        dev_paths,
+        dev_materials
+        );
+        iterationComplete = depth == traceDepth;
 	}
 
-  // Assemble this iteration and apply it to the image
-  dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
+    // Assemble this iteration and apply it to the image
+    dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
 	finalGather<<<numBlocksPixels, blockSize1d>>>(num_paths, dev_image, dev_paths);
 
     ///////////////////////////////////////////////////////////////////////////
@@ -422,7 +474,7 @@ void showGBuffer(uchar4* pbo) {
 }
 
 void showImage(uchar4* pbo, int iter) {
-const Camera &cam = hst_scene->state.camera;
+    const Camera &cam = hst_scene->state.camera;
     const dim3 blockSize2d(8, 8);
     const dim3 blocksPerGrid2d(
             (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
@@ -431,3 +483,81 @@ const Camera &cam = hst_scene->state.camera;
     // Send results to OpenGL buffer for rendering
     sendImageToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, iter, dev_image);
 }
+
+__global__ void denoise(glm::vec3* dev_imageIn, glm::vec3* dev_imageOut, const int stepWidth, const glm::vec2 resolution,
+                        const glm::vec2* dev_offsets, const float* dev_filter, const float colorSigma, const float normalSigma, 
+                        const float positionSigma, const GBufferPixel* dev_gBuffer) {
+    int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+
+    if (x < resolution.x && y < resolution.y) {
+        int originalIndex = x + (y * resolution.x);
+        
+        //Center point values (current pixel)
+        glm::vec3 originalColor = dev_imageIn[originalIndex];
+        glm::vec3 originalNorm = dev_gBuffer[originalIndex].normal;
+        glm::vec3 originalPos = dev_gBuffer[originalIndex].position;
+
+        glm::vec3 sum = glm::vec3(0.0);
+        float cumW = 0.0;
+
+        for (int i = 0; i < 25; ++i) { // Get neighbors
+            glm::vec2 neighbor_offset = dev_offsets[i] * glm::vec2(stepWidth);
+            int neighborX = x + neighbor_offset.x;
+            int neighborY = y + neighbor_offset.y;
+            if (neighborX >= 0 && neighborX < resolution.x && neighborY >= 0 && neighborY < resolution.y) { // check bounds of image
+                int neighborIndex = neighborX + (neighborY * resolution.x);
+
+                glm::vec3 color = dev_imageIn[neighborIndex];
+                float colorWeight = min(exp(-(glm::length2(originalColor - color)) / colorSigma), 1.f);
+                
+                glm::vec3 norm = dev_gBuffer[neighborIndex].normal;
+                float normWeight = min(exp(-(max(glm::length2(originalNorm - norm) / (stepWidth * stepWidth), 0.f) / normalSigma)), 1.f);
+
+                glm::vec3 pos = dev_gBuffer[neighborIndex].position;
+                float posWeight = min(exp(-(glm::length2(originalPos - norm) / positionSigma)), 1.f);
+
+                float weight = colorWeight * normWeight * posWeight;
+                sum += color * weight * dev_filter[i];
+                cumW += weight * dev_filter[i];
+                //blurred_pix += dev_filter[i] * dev_imageIn[neighbor_index];
+            }
+        }
+        dev_imageOut[originalIndex] = sum / cumW;
+    }
+}
+
+void showDenoise(uchar4* pbo, int iter, const int filterSize, const float colorSigma, const float normalSigma, const float positionSigma) {
+    const Camera& cam = hst_scene->state.camera;
+    const dim3 blockSize2d(8, 8);
+    const dim3 blocksPerGrid2d(
+        (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
+        (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
+    const int pixelcount = cam.resolution.x * cam.resolution.y;
+
+    // Copy image to denoise buffer so it doesnt affect orignial image
+    cudaMemcpy(dev_denoise_in, dev_image, pixelcount * sizeof(glm::vec3), cudaMemcpyDeviceToDevice);
+    const float squaredColor = pow(colorSigma, 2);
+    const float squaredNormal = pow(normalSigma * .1, 2);
+    const float squaredPos = pow(positionSigma, 2);
+    int i = 0;
+    float time;
+    cudaEventRecord(startTime);
+    while (4 * (1 << i) + 1 < filterSize)  { // Multiple iterations of denoising
+        int stepWidth = 1 << i;
+        denoise << <blocksPerGrid2d, blockSize2d >> > (dev_denoise_in, dev_denoise_out, stepWidth, 
+                                                        cam.resolution, dev_offsets, dev_filter,
+                                                        squaredColor, squaredNormal, squaredPos, dev_gBuffer);
+        cudaDeviceSynchronize();
+        //Ping pong buffers
+        glm::vec3* temp = dev_denoise_in;
+        dev_denoise_in = dev_denoise_out;
+        dev_denoise_out = temp;
+        ++i;
+    }
+    cudaEventRecord(endTime);
+    cudaEventSynchronize(endTime);
+    cudaEventElapsedTime(&time, startTime, endTime);
+    std::cout << "Time denoise: " << time << std::endl;
+    sendImageToPBO << <blocksPerGrid2d, blockSize2d >> > (pbo, cam.resolution, iter, dev_denoise_in);
+}
\ No newline at end of file
diff --git a/src/pathtrace.h b/src/pathtrace.h
index 9e12f440..24307f21 100644
--- a/src/pathtrace.h
+++ b/src/pathtrace.h
@@ -8,3 +8,4 @@ void pathtraceFree();
 void pathtrace(int frame, int iteration);
 void showGBuffer(uchar4 *pbo);
 void showImage(uchar4 *pbo, int iter);
+void showDenoise(uchar4* pbo, int iter, const int filterSize, const float colorSigma, const float normalSigma, const float positionSigma);
\ No newline at end of file
diff --git a/src/sceneStructs.h b/src/sceneStructs.h
index da7e558a..2f1e42d7 100644
--- a/src/sceneStructs.h
+++ b/src/sceneStructs.h
@@ -77,6 +77,9 @@ struct ShadeableIntersection {
 
 // CHECKITOUT - a simple struct for storing scene geometry information per-pixel.
 // What information might be helpful for guiding a denoising filter?
+// Need to store normal, position of intersection
 struct GBufferPixel {
   float t;
+  glm::vec3 normal;
+  glm::vec3 position;
 };