Skip to content

Commit e936c1c

Browse files
authored
Refactor visual anagrams section and update prompts
1 parent 4190bdf commit e936c1c

File tree

1 file changed

+26
-64
lines changed

1 file changed

+26
-64
lines changed

project-5/index.html

Lines changed: 26 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -835,71 +835,45 @@ <h4>St. Basil's Cathedral with prompt <code>'an oil painting of a snowy mountain
835835
<!-- ========================================================= -->
836836
<section id="part-1-8">
837837
<h2>Part 1.8 – Visual Anagrams</h2>
838+
839+
We now have the necessary tools to generate visual anagrams, or images that look like another different one when flipped/rotated. As an example for a vertical flip anagram, we would start with 2 prompt embeddings <code>p<sub>1</sub></code> and <code>p<sub>2</sub></code>. For <code>p<sub>1</sub></code>, we would compute the noise estimate &epsilon;<sub>1</sub> normally at each step, but for <code>p<sub>2</sub></code>, we flip the image <code>x<sub>t</sub></code> first before computing the noise estimate, then flip back the estimate to obtain &epsilon;<sub>2</sub>. Once this is done, we will use the average of &epsilon;<sub>1</sub> and &epsilon;<sub>2</sub> as the final noise estimate for each step. The variance can also be computed similarly, namely v<sub>1</sub> will be computed in the usual way, while v<sub>2</sub> will be the flipped variance estimate of the flipped <code>x<sub>t</sub></code>, and the final variance estimate will (v<sub>1</sub> + v<sub>2</sub>) / 2. Below are a few examples of such an effect, with <code>p<sub>1</sub></code> being the first prompt and <code>p<sub>2</sub></code> being the second:
838840

839841
<div class="subsection">
840-
<h3>1.8.1 – Code: visual_anagrams</h3>
841-
<pre><code># TODO
842-
# def visual_anagrams(
843-
# prompt_embeds_p1,
844-
# prompt_embeds_p2,
845-
# uncond_prompt_embeds,
846-
# timesteps,
847-
# scale=7,
848-
# num_inference_steps=...,
849-
# ):
850-
# """
851-
# Returns:
852-
# image: torch.Tensor of shape (1, 3, 64, 64) in [-1, 1]
853-
# """
854-
# # TODO</code></pre>
855-
856-
<p class="note">
857-
Notes: include your flipping operation (e.g., torch.flip(..., dims=[2])) and how you combine
858-
noise / variance estimates (if applicable).
859-
</p>
860-
</div>
861-
862-
<div class="subsection">
863-
<h3>1.8.2 – Two Visual Anagram Illusions</h3>
864-
<p>
865-
Each illusion should look like one concept normally, and another concept when flipped upside down.
866-
Show both orientations.
867-
</p>
868-
869-
<h4>Illusion 1</h4>
870-
<p><strong>Prompt p1:</strong> <em><!-- TODO: prompt 1 --></em><br/>
871-
<strong>Prompt p2:</strong> <em><!-- TODO: prompt 2 --></em>
872-
</p>
842+
<h3>Prompts: <code>'an oil painting of an old man'</code> and <code>'an oil painting of people around a campfire'</code></h3>
873843
<div class="image-row">
874844
<figure>
875-
<img src="images/part1_8_illusion1_original.png" alt="Visual anagram illusion 1 (original)" />
876-
<figcaption>Illusion 1 – Original orientation</figcaption>
845+
<img src="images/anagram/anagram1_256.png" alt="anagram1_256.png" />
846+
<figcaption>Original"</figcaption>
877847
</figure>
878848
<figure>
879-
<img src="images/part1_8_illusion1_flipped.png" alt="Visual anagram illusion 1 (flipped)" />
880-
<figcaption>Illusion 1 – Flipped upside down</figcaption>
849+
<img src="images/anagram/anagram1_flip_256.png" alt="anagram1_256.png" />
850+
<figcaption>Flipped</figcaption>
881851
</figure>
882852
</div>
883853

884-
<h4>Illusion 2</h4>
885-
<p><strong>Prompt p1:</strong> <em><!-- TODO: prompt 1 --></em><br/>
886-
<strong>Prompt p2:</strong> <em><!-- TODO: prompt 2 --></em>
887-
</p>
854+
<h3>Prompts: <code>'a lithograph of waterfalls'</code> and <code>'a man wearing a hat'</code></h3>
888855
<div class="image-row">
889856
<figure>
890-
<img src="images/part1_8_illusion2_original.png" alt="Visual anagram illusion 2 (original)" />
891-
<figcaption>Illusion 2 – Original orientation</figcaption>
857+
<img src="images/anagram/anagram2_256.png" alt="anagram2_256.png" />
858+
<figcaption>Original"</figcaption>
892859
</figure>
893860
<figure>
894-
<img src="images/part1_8_illusion2_flipped.png" alt="Visual anagram illusion 2 (flipped)" />
895-
<figcaption>Illusion 2 – Flipped upside down</figcaption>
861+
<img src="images/anagram/anagram2_flip_256.png" alt="anagram2_256.png" />
862+
<figcaption>Flipped</figcaption>
896863
</figure>
897864
</div>
898865

899-
<p class="note">
900-
Brief discussion: what makes the illusion work? How sensitive is it to guidance scale / steps / noise schedule?
901-
<!-- TODO -->
902-
</p>
866+
<h3>Prompts: <code>'an oil painting of a snowy mountain village'</code> and <code>'a photo of a dog'</code></h3>
867+
<div class="image-row">
868+
<figure>
869+
<img src="images/anagram/anagram3_256.png" alt="anagram3_256.png" />
870+
<figcaption>Original"</figcaption>
871+
</figure>
872+
<figure>
873+
<img src="images/anagram/anagram3_flip_256.png" alt="anagram3_256.png" />
874+
<figcaption>Flipped</figcaption>
875+
</figure>
876+
</div>
903877
</div>
904878
</section>
905879

@@ -908,24 +882,12 @@ <h4>Illusion 2</h4>
908882
<!-- ========================================================= -->
909883
<section id="part-1-9">
910884
<h2>Part 1.9 – Hybrid Images</h2>
885+
886+
With the technqiues above, we can now also create hybrid images, or images that look like different subjects depending on the viewing distance. The classical way to create a hybrid image is to transform the image you want to see at close range with a low-pass filter, thus keeping
911887

912888
<div class="subsection">
913889
<h3>1.9.1 – Code: make_hybrids</h3>
914-
<pre><code># TODO
915-
# def make_hybrids(
916-
# image_a,
917-
# image_b,
918-
# lowpass_sigma=...,
919-
# highpass_sigma=...,
920-
# blend_weight=...,
921-
# ):
922-
# """
923-
# Returns:
924-
# hybrid: torch.Tensor or np.ndarray (document your format)
925-
# low_freq: low-frequency component (optional)
926-
# high_freq: high-frequency component (optional)
927-
# """
928-
# # TODO</code></pre>
890+
929891

930892
<p class="note">
931893
Notes: describe your filter choice (Gaussian blur / FFT), the cutoff frequencies (sigmas),

0 commit comments

Comments
 (0)