Skip to content

Commit d7f3dc5

Browse files
authored
Update proj1.html
1 parent 54febbe commit d7f3dc5

File tree

1 file changed

+43
-1
lines changed

1 file changed

+43
-1
lines changed

project-1/proj1.html

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ <h1>Project 1: Colorizing the Prokudin-Gorskii photo collection</h1>
1515

1616
<div align="center">
1717
<img src="images/intro.jpg" alt="intro.jpg" width="50%">
18-
<figcaption style="margin-bottom:6px;">see submission/code/sobel.py</figcaption>
1918
</div>
2019

2120
<!-- Section 1 -->
@@ -73,7 +72,9 @@ <h2>Naive Search</h2>
7372

7473
</div>
7574

75+
<p>
7676
Unfortunately, because each crop has a height of <i>(7W / 8)</i> &times; <i>(5H / 24)</i>, the total number of NCC computations for each alignment is <i>((W - 7W / 8) + 1)</i> &times; <i>((H/3 - 5H / 24) + 1)</i> = <i>(W / 8 + 1)</i> &times; <i>(H / 8 + 1)</i> = <i>O(HW)</i>. Since each NCC computation requires <i>O(HW)</i> operations, aligning an image of dimensions <i>W</i> &times; <i>3W</i> takes <i>O(W<sup>2</sup>)</i> time using the naive search above. Because the .tif files are about 9 times bigger than the .jpg files in both dimensions, performing the same search on these files will take more than 6500x more times longer to compute. Even if we don't crop horizontally at all, it would still have a complexity of <i>O(W<sup>3</sup>)</i> and require more than 9<sup>3</sup> &approx; 720x more time on .tif files. A more efficient method is required.
77+
</p>
7778
<hr>
7879

7980
<!-- Section 4 -->
@@ -85,7 +86,9 @@ <h2>Image Pyramid</h2>
8586
<img src="images/Image_pyramid_svg.png" alt="Image_pyramid_svg.png" width="25%">
8687
<figcaption style="margin-bottom:6px;">Source: <a href="https://en.wikipedia.org/wiki/Pyramid_%28image_processing%29">Wikipedia</a></figcaption>
8788
</div>
89+
<p>
8890
Now, we will only calculate the best displacement if the given width is below a certain threshold. For images above this threshold, we can first downscale the image by 2x, pass it back to the function, and the returned shifts are scaled up by 2x to return the best shifts on the input image. This means that the best shifts in the base case (<i>x</i><sub>lowest</sub>, <i>y</i><sub>lowest</sub>) will be scaled up and used in the previous recursive call, which is the scaled image 1 call above. This will continue until we return to the top-level call, and at that point, the returned shifts will be within 1 or 2 pixels of the best overall displacement. The last thing to keep in mind is to downscale the cropping box as well, which is simple to do since it is computed on the full image, and one can simply divide its coordinates by 2 for each recursive call. In practice, setting W<sub>min</sub> = 72 gives the best tradeoff between the search size and the number of rescales. With these optimizations in place, the computing time is now much faster:
91+
</p>
8992

9093
<div style="display:flex; flex-wrap:wrap; justify-content:center; text-align:center;">
9194

@@ -191,5 +194,44 @@ <h2>Image Pyramid</h2>
191194
</p>
192195
<hr>
193196

197+
<!-- Section 5 -->
198+
<h2>Cropping with Sobel</h2>
199+
<p>
200+
Although having a fixed cropping dimension worked for aligning the images, because the final result is simply the intersection of the translations, artifacts from the border still remain. Is there a way to automatically crop the borders? The answer is yes. We can start by applying the sobel operator
201+
</p>
202+
203+
<div style="display:flex; flex-wrap:wrap; justify-content:center; text-align:center;">
204+
205+
<figure style="flex: 1 0 40%; margin:12px;">
206+
<img src="images/equationX.png" alt="equationX.png" width="50%">
207+
</figure>
208+
209+
<figure style="flex: 1 0 40%; margin:12px;">
210+
<img src="images/equationY.png" alt="equationY.png" width="50%">
211+
</figure>
212+
213+
</div>
214+
215+
<p>
216+
with convolution to the base image, and add the results together. This will produce a composite of the detected edges along the <i>x</i>- and <i>y</i>-axis. The next step is to reconstruct the borders of each plate from the image. Because Sobel is distance-invariant, we can convolve the kernel on only the valid patches, resulting in the new image having dimensions <i>(W - 1)</i> &times; <i>(H - 1)</i>.
217+
</p>
218+
219+
<div align="center">
220+
<img src="images/edges" alt="edges.png" width="50%">
221+
<figcaption style="margin-bottom:6px;">sobelX &starf; img + sobelY &starf; img</figcaption>
222+
</div>
223+
224+
<p>
225+
Now, we can take the average of each row, and those with the lowest and highest values will be the edges. By splitting the resulting array into 4 subarrays [0 : <i>H/8</i>], [<i>H/8</i> : <i>H/2</i>], [<i>H/2</i> : <i>7H / 8</i>], and [<i>H/8</i> : <i>H</i>], we can find the argmax and argmin in each subarray, which will be used as the horizontal borders of each plate image.
226+
</p>
227+
228+
<div align="center">
229+
<img src="images/y.png" alt="y.png" width="50%">
230+
</div>
231+
232+
The vertical borders can be estimated using a similar strategy, except that we need to split the column averages into 3 subsections based on the values found in the previous part. This will ensure that we are able to find the vertical edges of each image plate. Although the center plate
233+
234+
<hr>
235+
194236
</body>
195237
</html>

0 commit comments

Comments
 (0)