You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: project-1/proj1.html
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ <h2>Naive Search</h2>
44
44
<p>
45
45
To find the best shift, the simplest way is to compute the NCC for every possible shift within the full image. However, not only is this inefficient, but the best shift would also just be (0, 0) for any image, since the crop would just be a copy of the original crop. To solve this issue, we need to limit how much the height can shift when aligning.<br>
46
46
<br>
47
-
Define <i>W</i> and <i>H</i> to be the width and height of the full image, and asssume, for approximations, that each plate takes up exactly a third of the full image. Considering only the top/blue plate, we can start by setting the upper limit for the top edge to be <i>(0 + H/3) / 2 = H/6</i>, and the bottom edge to be <i>(2H / 3 + H) / 2 = 5H / 6</i>. This means the top edge should be at least shifted down by <i>H/6 - 0 = H/6</i>, and the bottom edge by <i>5H / 6 - H/3 = H/2</i>. Therefore, a good place to start is a displacement of <i>(0, (H/6 + H/2) / 2) = (0, H/3)</i> with a search range of [<i>-H/6</i>, <i>H/6</i>]. For the bottom/red plate, the equivalent displacement is just <i>(0, -H/3)</i> with the same search range. Any shifts that brings the crop outside the original image will be ignored.<br>
47
+
Define <i>W</i> and <i>H</i> to be the width and height of the full image, and assume, for approximations, that each plate takes up exactly a third of the full image. Considering only the top/blue plate, we can start by setting the upper limit for the top edge to be <i>(0 + H/3) / 2 = H/6</i>, and the bottom edge to be <i>(2H / 3 + H) / 2 = 5H / 6</i>. This means the top edge should be at least shifted down by <i>H/6 - 0 = H/6</i>, and the bottom edge by <i>5H / 6 - H/3 = H/2</i>. Therefore, a good place to start is a displacement of <i>(0, (H/6 + H/2) / 2) = (0, H/3)</i> with a search range of [<i>-H/6</i>, <i>H/6</i>]. For the bottom/red plate, the equivalent displacement is just <i>(0, -H/3)</i> with the same search range. Any shifts that bring the crop outside the original image will be ignored.<br>
48
48
<br>
49
49
Using a starting crop of {<i>(W/16, H/16), (W - W/16, H/3 - H/16)</i>} for the blue plate and {<i>(W/16, 2H / 3 + H/16), (W - W/16, H - H/16)</i>} for the red plate, where the tuples are the upper left and lower right corner pixels in (<i>x</i>, <i>y</i>) coordinates, we can obtain the following best shifts (B, R):
50
50
</p>
@@ -74,11 +74,11 @@ <h2>Naive Search</h2>
74
74
<!-- Section 4 -->
75
75
<h2>Image Pyramid</h2>
76
76
<p>
77
-
Unfortunately, because each crop has a height of <i>(7W / 8)</i> × <i>(5H / 24)</i>, the total number of NCC computations for each alignment is <i>((W - 7W / 8) + 1)</i> × <i>((H/3 - 5H / 24) + 1)</i> = <i>(W / 8 + 1)</i> × <i>(H / 8 + 1)</i> = <i>O(HW)</i>. Since each NCC computation requires <i>O(HW)</i> operations, aligning an image of dimensions <i>W</i> × <i>H</i> takes <i>O((HW)<sup>2</sup>)</i> time using the naive search above. Because the .tif files are about 9 times bigger than the .jpg files in both dimensions, performing the same search on these files will take more than 6500x more time to compute (hours instead of seconds). Even if the width is fixed, it would still require more than 9<sup>3</sup> ≈ 720x more time on .tif files. A more efficient method is required.<br>
77
+
Unfortunately, because each crop has a height of <i>(7W / 8)</i> × <i>(5H / 24)</i>, the total number of NCC computations for each alignment is <i>((W - 7W / 8) + 1)</i> × <i>((H/3 - 5H / 24) + 1)</i> = <i>(W / 8 + 1)</i> × <i>(H / 8 + 1)</i> = <i>O(HW)</i>. Since each NCC computation requires <i>O(HW)</i> operations, aligning an image of dimensions <i>W</i> × <i>H</i> takes <i>O((HW)<sup>2</sup>)</i> time using the naive search above. Because the .tif files are about 9 times bigger than the .jpg files in both dimensions, performing the same search on these files will take more than 6500x more times longer to compute (hours instead of seconds). Even if the width is fixed, it would still require more than 9<sup>3</sup> ≈ 720x more time on .tif files. A more efficient method is required.<br>
78
78
<br>
79
-
Instead of searching over the entire image, we can scale down the image and find the best shiftat a much smaller size. Once the most accurate displacement (<i>x</i><sub>lowest</sub>, <i>y</i><sub>lowest</sub>) is calculated for the lowest sized image, we can perform another search at the image 2x the size. Only this time, we start at (<i>2x</i><sub>lowest</sub>, <i>2y</i><sub>lowest</sub>), and only search over a window of [-2, 2] for pixel corrections. Once the best displacement at the layer above the lowest scale is computed, we can multiply the result again by 2 and pass it to the image in the next layer above. The limited window is used because each downscaled coordinate could have only come from a total of 4 different coordaintes, so the best coordinate on the higher scale is only limited to [-1, 1] of the interpolated coordinate. An additional pixel in the search range is used to ensure further reduce the effect of noise on the best placement at the lowest scale. To further optimze this approrach without storing all downscale images at once, we can modify the best displacement function to have a recursive call.<br>
79
+
Instead of searching over the entire image, we can scale down the image and find the best shift at a much smaller size. Once the most accurate displacement (<i>x</i><sub>lowest</sub>, <i>y</i><sub>lowest</sub>) is calculated for the lowest-sized image, we can perform another search at the image 2x the size. Only this time, we start at (<i>2x</i><sub>lowest</sub>, <i>2y</i><sub>lowest</sub>), and only search over a window of [-2, 2] for pixel corrections. Once the best displacement at the layer above the lowest scale is computed, we can multiply the result again by 2 and pass it to the image in the next layer above. The limited window is used because each downscaled coordinate could have only come from a total of 4 different coordinates, so the best coordinate on the higher scale is limited to [-1, 1] of the interpolated coordinate. An additional pixel in the search range is used to further reduce the effect of noise on the best placement at the lowest scale. To further optimize this approach without storing all downscale images at once, we can modify the best displacement function to have a recursive call.<br>
80
80
<br>
81
-
Instead of directly computing the best displacement on the full image, we only calculate if the given with is below a certain threshold. For images above this threshold, we can first downscale the image by 2x, pass it back to the function, and the returned shifts scaled up by 2x to return the best shifts on the input image. This means that the best shifts in the base case (<i>x</i><sub>lowest</sub>, <i>y</i><sub>lowest</sub>) will be scaled up and used in the previous recursive call, which is the scaled image 1 layer above. This will continue until we return to the top level call, and at that point, the returned shifts will be within 1 or 2 pixels of the best overall displacement. The last thing to keep in mind is to downscale the cropping box as well, which is simple to do since it is computed on the full image and one can simply divide its coordinates by 2 for each recurive call. In practice, setting W<sub>min</sub> = 72 gives the best tradeoff between the search size and the number of rescales. With these optimzations in place, the computing time is now much faster:
81
+
Instead of directly computing the best displacement on the full image, we only calculate if the given width is below a certain threshold. For images above this threshold, we can first downscale the image by 2x, pass it back to the function, and the returned shifts are scaled up by 2x to return the best shifts on the input image. This means that the best shifts in the base case (<i>x</i><sub>lowest</sub>, <i>y</i><sub>lowest</sub>) will be scaled up and used in the previous recursive call, which is the scaled image 1 layer above. This will continue until we return to the top-level call, and at that point, the returned shifts will be within 1 or 2 pixels of the best overall displacement. The last thing to keep in mind is to downscale the cropping box as well, which is simple to do since it is computed on the full image, and one can simply divide its coordinates by 2 for each recursive call. In practice, setting W<sub>min</sub> = 72 gives the best tradeoff between the search size and the number of rescales. With these optimizations in place, the computing time is now much faster:
0 commit comments