Skip to content

Commit 79e79b7

Browse files
committed
mkdocs fixes, PR #1032
Squashed commit of the following: commit 2c1e0168bb03a2cd625f2d4aca40eee0fdf7e4af Merge: 2325c6c 31f2733 Author: Lincoln Stein <[email protected]> Date: Tue Oct 11 08:33:18 2022 -0400 Merge branch 'mkdocs-fixes' of https://github.com/mauwii/stable-diffusion into mauwii-mkdocs-fixes commit 31f2733 Merge: d9d6d3a a61a690 Author: Lincoln Stein <[email protected]> Date: Tue Oct 11 08:05:52 2022 -0400 Merge branch 'main' into mkdocs-fixes commit d9d6d3a Author: mauwii <[email protected]> Date: Tue Oct 11 08:13:04 2022 +0200 some more minor, overseen fixes to IMG2IMG commit 4ab5a2a Author: mauwii <[email protected]> Date: Tue Oct 11 07:49:11 2022 +0200 add 4gotten alt-text to images commit f778bd9 Author: mauwii <[email protected]> Date: Tue Oct 11 07:18:11 2022 +0200 update OTHER.md - fix codeblocks, add admonitions, embed graphic commit a19f148 Author: mauwii <[email protected]> Date: Tue Oct 11 06:51:29 2022 +0200 update IMG2IMG.md commit c1f1dfa Author: mauwii <[email protected]> Date: Tue Oct 11 06:10:25 2022 +0200 update EMBIGGEN.md - fix codeblocks - fix toc - use admonitions commit 791e6c6 Author: mauwii <[email protected]> Date: Tue Oct 11 05:58:53 2022 +0200 better admonitions for CLI.md commit e078025 Author: mauwii <[email protected]> Date: Tue Oct 11 05:50:32 2022 +0200 huge update to CLI.md way too many updates to list them all, including: - render keys for keyboard-shortcuts - quote commands and "unhide" parameter-values (like `<int>`, `<string>` - fix codeblocks - quote commands - quote filenames - use admonitions - .... commit bd98dd2 Author: mauwii <[email protected]> Date: Tue Oct 11 04:49:57 2022 +0200 fix INPAINTING.md - fix numbered List - replace text key combos with actual rendered keyboard keys commit 5392000 Author: mauwii <[email protected]> Date: Tue Oct 11 04:30:11 2022 +0200 fix nubered list and codeblocks in INSTALL_WINDOWS commit ffe9276 Author: mauwii <[email protected]> Date: Tue Oct 11 04:12:56 2022 +0200 fix numbered list in INSTALL_LINUX.md also fix blank lines, codeblocks and admonition commit 2c6a6a5 Author: mauwii <[email protected]> Date: Tue Oct 11 03:51:03 2022 +0200 upgrade INSTALL_MAC.md: - use annotations and content-tabs yes, this looks ugly in repo afterwards, but plz also look at mkdocs: https://mauwii.github.io/stable-diffusion/installation/INSTALL_MAC/ commit 8f6c544 Author: mauwii <[email protected]> Date: Tue Oct 11 01:43:11 2022 +0200 comment out PR part in mkdocs-flow.yml commit b52c14a Merge: 97ebe58 a1b0b91 Author: mauwii <[email protected]> Date: Tue Oct 11 01:17:28 2022 +0200 Merge branch 'mkdocs-fixes' of github.com:mauwii/stable-diffusion into mkdocs-fixes commit a1b0b91 Author: mauwii <[email protected]> Date: Tue Oct 11 00:59:44 2022 +0200 fix conda env in codeblock commit 5f9f9a2 Author: mauwii <[email protected]> Date: Tue Oct 11 00:43:46 2022 +0200 fix 4gotten title in TEXTUAL_INVERSION commit 8f025b0 Author: mauwii <[email protected]> Date: Tue Oct 11 00:41:52 2022 +0200 quote repo_url and repo_name otherwise the version/stars/forks did not appear commit 3a52b7d Author: mauwii <[email protected]> Date: Tue Oct 11 00:39:54 2022 +0200 fix TEXTUAL_INVERSION headline to fit the others commit 389b21f Author: mauwii <[email protected]> Date: Tue Oct 11 00:35:48 2022 +0200 fix SAMPLER_CONVERGENCE and add emoji commit f26fc79 Author: mauwii <[email protected]> Date: Tue Oct 11 00:32:04 2022 +0200 fix INSTALL_DOCKER.md: - fix title (Docker instead of "Before you begin") - add headline with Emoji - fix headlines to render toc correct commit cbc3520 Author: mauwii <[email protected]> Date: Tue Oct 11 00:24:58 2022 +0200 add headline with emoji to INSTALL_MAC.md commit 25f0614 Author: mauwii <[email protected]> Date: Tue Oct 11 00:21:01 2022 +0200 add log emoji to docs/CHANGELOG.md commit 4200568 Author: mauwii <[email protected]> Date: Tue Oct 11 00:20:47 2022 +0200 use better fitting Icon for new Name commit 0c65bad Author: mauwii <[email protected]> Date: Tue Oct 11 00:09:07 2022 +0200 add Headline with Emoji to WEB and POSTPROCESS commit 1c1cf26 Author: mauwii <[email protected]> Date: Mon Oct 10 23:56:16 2022 +0200 update index.md: - remove unused template reference - make headline rendered bold and underlined, add (kind of) subtitle - update discord badge and link - update Quick links to look like in GH-Readme - also remove self reference to docs - add screenshot as in GH-Readme - add note pointing to issues tab - update path in command line to reflect new Repo Name commit 0e29b07 Author: mauwii <[email protected]> Date: Mon Oct 10 23:23:10 2022 +0200 chng site_name to `Stable Diffusion Toolkit Docs` commit ad8a60d Author: mauwii <[email protected]> Date: Mon Oct 10 23:00:02 2022 +0200 fix repo_url in mkdocs.yml commit 234569d Author: mauwii <[email protected]> Date: Mon Oct 10 22:54:39 2022 +0200 fix link to upscaling in WEB.md and TOC - TOC fixed by adding `#` to every headline after `## Parting remarks` - add missing blank lines commit 97c84ad Author: mauwii <[email protected]> Date: Mon Oct 10 22:25:32 2022 +0200 fix broken links in docs/CHANGELOG.md commit bce62b3 Author: mauwii <[email protected]> Date: Mon Oct 10 22:15:37 2022 +0200 add title to CHANGELOG.md to render TOC wo. `**` alternatively remove `**` around headline commit 97ebe58 Author: mauwii <[email protected]> Date: Tue Oct 11 00:59:44 2022 +0200 fix conda env in codeblock commit 87ac217 Author: mauwii <[email protected]> Date: Tue Oct 11 00:43:46 2022 +0200 fix 4gotten title in TEXTUAL_INVERSION commit 91439e8 Author: mauwii <[email protected]> Date: Tue Oct 11 00:41:52 2022 +0200 quote repo_url and repo_name otherwise the version/stars/forks did not appear commit 8a632a9 Author: mauwii <[email protected]> Date: Tue Oct 11 00:39:54 2022 +0200 fix TEXTUAL_INVERSION headline to fit the others commit 7c8ffe2 Author: mauwii <[email protected]> Date: Tue Oct 11 00:35:48 2022 +0200 fix SAMPLER_CONVERGENCE and add emoji commit e2e86d2 Author: mauwii <[email protected]> Date: Tue Oct 11 00:32:04 2022 +0200 fix INSTALL_DOCKER.md: - fix title (Docker instead of "Before you begin") - add headline with Emoji - fix headlines to render toc correct commit 8b54c08 Author: mauwii <[email protected]> Date: Tue Oct 11 00:24:58 2022 +0200 add headline with emoji to INSTALL_MAC.md commit 8d8a032 Author: mauwii <[email protected]> Date: Tue Oct 11 00:21:01 2022 +0200 add log emoji to docs/CHANGELOG.md commit 76519f6 Author: mauwii <[email protected]> Date: Tue Oct 11 00:20:47 2022 +0200 use better fitting Icon for new Name commit aff0725 Author: mauwii <[email protected]> Date: Tue Oct 11 00:09:07 2022 +0200 add Headline with Emoji to WEB and POSTPROCESS commit 0f7898c Author: mauwii <[email protected]> Date: Mon Oct 10 23:56:16 2022 +0200 update index.md: - remove unused template reference - make headline rendered bold and underlined, add (kind of) subtitle - update discord badge and link - update Quick links to look like in GH-Readme - also remove self reference to docs - add screenshot as in GH-Readme - add note pointing to issues tab - update path in command line to reflect new Repo Name commit f4c04ea Author: mauwii <[email protected]> Date: Mon Oct 10 23:23:10 2022 +0200 chng site_name to `Stable Diffusion Toolkit Docs` commit 6e62482 Author: mauwii <[email protected]> Date: Mon Oct 10 23:00:02 2022 +0200 fix repo_url in mkdocs.yml commit 158848d Author: mauwii <[email protected]> Date: Mon Oct 10 22:54:39 2022 +0200 fix link to upscaling in WEB.md and TOC - TOC fixed by adding `#` to every headline after `## Parting remarks` - add missing blank lines commit 533736e Author: mauwii <[email protected]> Date: Mon Oct 10 22:29:46 2022 +0200 fix link to truncation_comparison.jpg in OTHER.md commit dd33514 Author: mauwii <[email protected]> Date: Mon Oct 10 22:25:32 2022 +0200 fix broken links in docs/CHANGELOG.md commit 374dd54 Author: mauwii <[email protected]> Date: Mon Oct 10 22:15:37 2022 +0200 add title to CHANGELOG.md to render TOC wo. `**` alternatively remove `**` around headline
1 parent 2325c6c commit 79e79b7

File tree

17 files changed

+506
-417
lines changed

17 files changed

+506
-417
lines changed

.github/workflows/mkdocs-flow.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ on:
33
push:
44
branches:
55
- main
6-
pull_request:
7-
branches:
8-
- main
6+
# pull_request:
7+
# branches:
8+
# - main
99
jobs:
1010
build:
1111
name: Deploy docs to GitHub Pages

docs/CHANGELOG.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
# **Changelog**
1+
---
2+
title: Changelog
3+
---
4+
5+
# :octicons-log-16: **Changelog**
26

37
- v2.0.0 (9 October 2022)
48

@@ -39,7 +43,7 @@
3943

4044
- v1.13 (3 September 2022
4145

42-
- Support image variations (see [VARIATIONS](docs/features/VARIATIONS.md)
46+
- Support image variations (see [VARIATIONS](features/VARIATIONS.md)
4347
([Kevin Gibbons](https://github.com/bakkot) and many contributors and reviewers)
4448
- Supports a Google Colab notebook for a standalone server running on Google hardware
4549
[Arturo Mendivil](https://github.com/artmen1516)
@@ -179,4 +183,4 @@
179183

180184
## Links
181185

182-
- **[Read Me](../readme.md)**
186+
- **[Read Me](index.md)**

docs/features/CLI.md

Lines changed: 143 additions & 128 deletions
Large diffs are not rendered by default.

docs/features/EMBIGGEN.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ it's similar to that, except it can work up to an arbitrarily large size
4343
has extra logic to re-run any number of the tile sub-sections of the image
4444
if for example a small part of a huge run got messed up.
4545

46-
## Usage
46+
### Usage
4747

4848
`-embiggen <scaling_factor> <esrgan_strength> <overlap_ratio OR overlap_pixels>`
4949

@@ -100,26 +100,30 @@ Tiles are numbered starting with one, and left-to-right,
100100
top-to-bottom. So, if you are generating a 3x3 tiled image, the
101101
middle row would be `4 5 6`.
102102

103-
## Example Usage
103+
### Examples
104104

105-
Running Embiggen with 512x512 tiles on an existing image, scaling up by a factor of 2.5x;
106-
and doing the same again (default ESRGAN strength is 0.75, default overlap between tiles is 0.25):
105+
!!! example ""
107106

108-
```bash
109-
invoke > a photo of a forest at sunset -s 100 -W 512 -H 512 -I outputs/forest.png -f 0.4 -embiggen 2.5
110-
invoke > a photo of a forest at sunset -s 100 -W 512 -H 512 -I outputs/forest.png -f 0.4 -embiggen 2.5 0.75 0.25
111-
```
107+
Running Embiggen with 512x512 tiles on an existing image, scaling up by a factor of 2.5x;
108+
and doing the same again (default ESRGAN strength is 0.75, default overlap between tiles is 0.25):
112109

113-
If your starting image was also 512x512 this should have taken 9 tiles.
110+
```bash
111+
invoke > a photo of a forest at sunset -s 100 -W 512 -H 512 -I outputs/forest.png -f 0.4 -embiggen 2.5
112+
invoke > a photo of a forest at sunset -s 100 -W 512 -H 512 -I outputs/forest.png -f 0.4 -embiggen 2.5 0.75 0.25
113+
```
114114

115-
If there weren't enough clouds in the sky of that forest you just made
116-
(and that image is about 1280 pixels (512*2.5) wide A.K.A. three
117-
512x512 tiles with 0.25 overlaps wide) we can replace that top row of
118-
tiles:
115+
If your starting image was also 512x512 this should have taken 9 tiles.
119116

120-
```bash
121-
invoke> a photo of puffy clouds over a forest at sunset -s 100 -W 512 -H 512 -I outputs/000002.seed.png -f 0.5 -embiggen_tiles 1 2 3
122-
```
117+
!!! example ""
118+
119+
If there weren't enough clouds in the sky of that forest you just made
120+
(and that image is about 1280 pixels (512*2.5) wide A.K.A. three
121+
512x512 tiles with 0.25 overlaps wide) we can replace that top row of
122+
tiles:
123+
124+
```bash
125+
invoke> a photo of puffy clouds over a forest at sunset -s 100 -W 512 -H 512 -I outputs/000002.seed.png -f 0.5 -embiggen_tiles 1 2 3
126+
```
123127

124128
## Fixing Previously-Generated Images
125129

@@ -128,27 +132,27 @@ look up the original prompt and provide an initial image. Just use the
128132
syntax `!fix path/to/file.png <embiggen>`. For example, you can rewrite the
129133
previous command to look like this:
130134

131-
~~~~
135+
```bash
132136
invoke> !fix ./outputs/000002.seed.png -embiggen_tiles 1 2 3
133-
~~~~
137+
```
134138

135139
A new file named `000002.seed.fixed.png` will be created in the output directory. Note that
136140
the `!fix` command does not replace the original file, unlike the behavior at generate time.
137141
You do not need to provide the prompt, and `!fix` automatically selects a good strength for
138142
embiggen-ing.
139143

140-
141-
**Note**
142-
Because the same prompt is used on all the tiled images, and the model
143-
doesn't have the context of anything outside the tile being run - it
144-
can end up creating repeated pattern (also called 'motifs') across all
145-
the tiles based on that prompt. The best way to combat this is
146-
lowering the `--strength` (`-f`) to stay more true to the init image,
147-
and increasing the number of steps so there is more compute-time to
148-
create the detail. Anecdotally `--strength` 0.35-0.45 works pretty
149-
well on most things. It may also work great in some examples even with
150-
the `--strength` set high for patterns, landscapes, or subjects that
151-
are more abstract. Because this is (relatively) fast, you can also
152-
preserve the best parts from each.
144+
!!! note
145+
146+
Because the same prompt is used on all the tiled images, and the model
147+
doesn't have the context of anything outside the tile being run - it
148+
can end up creating repeated pattern (also called 'motifs') across all
149+
the tiles based on that prompt. The best way to combat this is
150+
lowering the `--strength` (`-f`) to stay more true to the init image,
151+
and increasing the number of steps so there is more compute-time to
152+
create the detail. Anecdotally `--strength` 0.35-0.45 works pretty
153+
well on most things. It may also work great in some examples even with
154+
the `--strength` set high for patterns, landscapes, or subjects that
155+
are more abstract. Because this is (relatively) fast, you can also
156+
preserve the best parts from each.
153157

154158
Author: [Travco](https://github.com/travco)

docs/features/IMG2IMG.md

Lines changed: 70 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
title: Image-to-Image
33
---
44

5-
# :material-image-multiple: **IMG2IMG**
5+
# :material-image-multiple: Image-to-Image
6+
7+
## `img2img`
68

79
This script also provides an `img2img` feature that lets you seed your creations with an initial
810
drawing or photo. This is a really cool feature that tells stable diffusion to build the prompt on
@@ -15,13 +17,17 @@ tree on a hill with a river, nature photograph, national geographic -I./test-pic
1517

1618
This will take the original image shown here:
1719

20+
<div align="center" markdown>
1821
<img src="https://user-images.githubusercontent.com/50542132/193946000-c42a96d8-5a74-4f8a-b4c3-5213e6cadcce.png" width=350>
19-
22+
</div>
23+
2024
and generate a new image based on it as shown here:
2125

26+
<div align="center" markdown>
2227
<img src="https://user-images.githubusercontent.com/111189/194135515-53d4c060-e994-4016-8121-7c685e281ac9.png" width=350>
28+
</div>
2329

24-
The `--init_img (-I)` option gives the path to the seed picture. `--strength (-f)` controls how much
30+
The `--init_img` (`-I`) option gives the path to the seed picture. `--strength` (`-f`) controls how much
2531
the original will be modified, ranging from `0.0` (keep the original intact), to `1.0` (ignore the
2632
original completely). The default is `0.75`, and ranges from `0.25-0.90` give interesting results.
2733
Other relevant options include `-C` (classification free guidance scale), and `-s` (steps). Unlike `txt2img`,
@@ -37,117 +43,136 @@ a very different image:
3743

3844
`photograph of a tree on a hill with a river`
3945

46+
<div align="center" markdown>
4047
<img src="https://user-images.githubusercontent.com/111189/194135220-16b62181-b60c-4248-8989-4834a8fd7fbd.png" width=350>
48+
</div>
49+
50+
!!! tip
4151

42-
(When designing prompts, think about how the images scraped from the internet were captioned. Very few photographs will
43-
be labeled "photograph" or "photorealistic." They will, however, be captioned with the publication, photographer, camera
44-
model, or film settings.)
52+
When designing prompts, think about how the images scraped from the internet were captioned. Very few photographs will
53+
be labeled "photograph" or "photorealistic." They will, however, be captioned with the publication, photographer, camera
54+
model, or film settings.
4555

4656
If the initial image contains transparent regions, then Stable Diffusion will only draw within the
47-
transparent regions, a process called "inpainting". However, for this to work correctly, the color
57+
transparent regions, a process called [`inpainting`](./INPAINTING.md#creating-transparent-regions-for-inpainting). However, for this to work correctly, the color
4858
information underneath the transparent needs to be preserved, not erased.
4959

50-
More details can be found here:
51-
[Creating Transparent Images For Inpainting](./INPAINTING.md#creating-transparent-regions-for-inpainting)
60+
!!! warning
61+
62+
`img2img` does not work properly on initial images smaller than 512x512. Please scale your
63+
image to at least 512x512 before using it. Larger images are not a problem, but may run out of VRAM on your
64+
GPU card.
65+
66+
To fix this, use the `--fit` option, which downscales the initial image to fit within the box specified
67+
by width x height:
5268

53-
**IMPORTANT ISSUE** `img2img` does not work properly on initial images smaller than 512x512. Please scale your
54-
image to at least 512x512 before using it. Larger images are not a problem, but may run out of VRAM on your
55-
GPU card. To fix this, use the --fit option, which downscales the initial image to fit within the box specified
56-
by width x height:
57-
~~~
58-
tree on a hill with a river, national geographic -I./test-pictures/big-sketch.png -H512 -W512 --fit
59-
~~~
69+
```bash
70+
invoke> "tree on a hill with a river, national geographic" -I./test-pictures/big-sketch.png -H512 -W512 --fit
71+
```
6072

6173
## How does it actually work, though?
6274

63-
The main difference between `img2img` and `prompt2img` is the starting point. While `prompt2img` always starts with pure
64-
gaussian noise and progressively refines it over the requested number of steps, `img2img` skips some of these earlier steps
65-
(how many it skips is indirectly controlled by the `--strength` parameter), and uses instead your initial image mixed with gaussian noise as the starting image.
75+
The main difference between `img2img` and `prompt2img` is the starting point. While `prompt2img` always starts with pure
76+
gaussian noise and progressively refines it over the requested number of steps, `img2img` skips some of these earlier steps
77+
(how many it skips is indirectly controlled by the `--strength` parameter), and uses instead your initial image mixed with gaussian noise as the starting image.
6678

6779
**Let's start** by thinking about vanilla `prompt2img`, just generating an image from a prompt. If the step count is 10, then the "latent space" (Stable Diffusion's internal representation of the image) for the prompt "fire" with seed `1592514025` develops something like this:
6880

69-
```commandline
81+
```bash
7082
invoke> "fire" -s10 -W384 -H384 -S1592514025
7183
```
7284

85+
<div align="center" markdown>
7386
![latent steps](../assets/img2img/000019.steps.png)
87+
</div>
7488

75-
Put simply: starting from a frame of fuzz/static, SD finds details in each frame that it thinks look like "fire" and brings them a little bit more into focus, gradually scrubbing out the fuzz until a clear image remains.
89+
Put simply: starting from a frame of fuzz/static, SD finds details in each frame that it thinks look like "fire" and brings them a little bit more into focus, gradually scrubbing out the fuzz until a clear image remains.
7690

77-
**When you use `img2img`** some of the earlier steps are cut, and instead an initial image of your choice is used. But because of how the maths behind Stable Diffusion works, this image needs to be mixed with just the right amount of noise (fuzz/static) for where it is being inserted. This is where the strength parameter comes in. Depending on the set strength, your image will be inserted into the sequence at the appropriate point, with just the right amount of noise.
91+
**When you use `img2img`** some of the earlier steps are cut, and instead an initial image of your choice is used. But because of how the maths behind Stable Diffusion works, this image needs to be mixed with just the right amount of noise (fuzz/static) for where it is being inserted. This is where the strength parameter comes in. Depending on the set strength, your image will be inserted into the sequence at the appropriate point, with just the right amount of noise.
7892

7993
### A concrete example
8094

81-
Say I want SD to draw a fire based on this hand-drawn image:
95+
I want SD to draw a fire based on this hand-drawn image:
8296

97+
<div align="center" markdown>
8398
![drawing of a fireplace](../assets/img2img/fire-drawing.png)
99+
</div>
84100

85101
Let's only do 10 steps, to make it easier to see what's happening. If strength is `0.7`, this is what the internal steps the algorithm has to take will look like:
86102

87-
![](../assets/img2img/000032.steps.gravity.png)
103+
<div align="center" markdown>
104+
![gravity32](../assets/img2img/000032.steps.gravity.png)
105+
</div>
88106

89107
With strength `0.4`, the steps look more like this:
90108

91-
![](../assets/img2img/000030.steps.gravity.png)
109+
<div align="center" markdown>
110+
![gravity30](../assets/img2img/000030.steps.gravity.png)
111+
</div>
92112

93113
Notice how much more fuzzy the starting image is for strength `0.7` compared to `0.4`, and notice also how much longer the sequence is with `0.7`:
94114

95115
| | strength = 0.7 | strength = 0.4 |
96-
| -- | -- | -- |
97-
| initial image that SD sees | ![](../assets/img2img/000032.step-0.png) | ![](../assets/img2img/000030.step-0.png) |
116+
| -- | :--: | :--: |
117+
| initial image that SD sees | ![step-0-32](../assets/img2img/000032.step-0.png) | ![step-0-30](../assets/img2img/000030.step-0.png) |
98118
| steps argument to `dream>` | `-S10` | `-S10` |
99119
| steps actually taken | 7 | 4 |
100-
| latent space at each step | ![](../assets/img2img/000032.steps.gravity.png) | ![](../assets/img2img/000030.steps.gravity.png) |
101-
| output | ![](../assets/img2img/000032.1592514025.png) | ![](../assets/img2img/000030.1592514025.png) |
120+
| latent space at each step | ![gravity32](../assets/img2img/000032.steps.gravity.png) | ![gravity30](../assets/img2img/000030.steps.gravity.png) |
121+
| output | ![000032.1592514025](../assets/img2img/000032.1592514025.png) | ![000030.1592514025](../assets/img2img/000030.1592514025.png) |
102122

103123
Both of the outputs look kind of like what I was thinking of. With the strength higher, my input becomes more vague, *and* Stable Diffusion has more steps to refine its output. But it's not really making what I want, which is a picture of cheery open fire. With the strength lower, my input is more clear, *but* Stable Diffusion has less chance to refine itself, so the result ends up inheriting all the problems of my bad drawing.
104124

125+
If you want to try this out yourself, all of these are using a seed of `1592514025` with a width/height of `384`, step count `10`, the default sampler (`k_lms`), and the single-word prompt `"fire"`:
105126

106-
If you want to try this out yourself, all of these are using a seed of `1592514025` with a width/height of `384`, step count `10`, the default sampler (`k_lms`), and the single-word prompt `fire`:
107-
108-
```commandline
127+
```bash
109128
invoke> "fire" -s10 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png --strength 0.7
110129
```
111130

112-
The code for rendering intermediates is on my (damian0815's) branch [document-img2img](https://github.com/damian0815/InvokeAI/tree/document-img2img) - run `invoke.py` and check your `outputs/img-samples/intermediates` folder while generating an image.
131+
The code for rendering intermediates is on my (damian0815's) branch [document-img2img](https://github.com/damian0815/InvokeAI/tree/document-img2img) - run `invoke.py` and check your `outputs/img-samples/intermediates` folder while generating an image.
113132

114133
### Compensating for the reduced step count
115134

116135
After putting this guide together I was curious to see how the difference would be if I increased the step count to compensate, so that SD could have the same amount of steps to develop the image regardless of the strength. So I ran the generation again using the same seed, but this time adapting the step count to give each generation 20 steps.
117136

118137
Here's strength `0.4` (note step count `50`, which is `20 ÷ 0.4` to make sure SD does `20` steps from my image):
119138

120-
```commandline
139+
```bash
121140
invoke> "fire" -s50 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png -f 0.4
122141
```
123142

124-
![](../assets/img2img/000035.1592514025.png)
143+
<div align="center" markdown>
144+
![000035.1592514025](../assets/img2img/000035.1592514025.png)
145+
</div>
125146

126-
and strength `0.7` (note step count `30`, which is roughly `20 ÷ 0.7` to make sure SD does `20` steps from my image):
147+
and here is strength `0.7` (note step count `30`, which is roughly `20 ÷ 0.7` to make sure SD does `20` steps from my image):
127148

128-
```commandline
149+
```bash
129150
invoke> "fire" -s30 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png -f 0.7
130151
```
131152

132-
![](../assets/img2img/000046.1592514025.png)
153+
<div align="center" markdown>
154+
![000046.1592514025](../assets/img2img/000046.1592514025.png)
155+
</div>
133156

134157
In both cases the image is nice and clean and "finished", but because at strength `0.7` Stable Diffusion has been give so much more freedom to improve on my badly-drawn flames, they've come out looking much better. You can really see the difference when looking at the latent steps. There's more noise on the first image with strength `0.7`:
135158

136-
![](../assets/img2img/000046.steps.gravity.png)
159+
![gravity46](../assets/img2img/000046.steps.gravity.png)
137160

138161
than there is for strength `0.4`:
139162

140-
![](../assets/img2img/000035.steps.gravity.png)
163+
![gravity35](../assets/img2img/000035.steps.gravity.png)
141164

142-
and that extra noise gives the algorithm more choices when it is evaluating how to denoise any particular pixel in the image.
165+
and that extra noise gives the algorithm more choices when it is evaluating how to denoise any particular pixel in the image.
143166

144167
Unfortunately, it seems that `img2img` is very sensitive to the step count. Here's strength `0.7` with a step count of `29` (SD did 19 steps from my image):
145168

146-
![](../assets/img2img/000045.1592514025.png)
169+
<div align="center" markdown>
170+
![gravity45](../assets/img2img/000045.1592514025.png)
171+
</div>
147172

148173
By comparing the latents we can sort of see that something got interpreted differently enough on the third or fourth step to lead to a rather different interpretation of the flames.
149174

150-
![](../assets/img2img/000046.steps.gravity.png)
151-
![](../assets/img2img/000045.steps.gravity.png)
175+
![gravity46](../assets/img2img/000046.steps.gravity.png)
176+
![gravity45](../assets/img2img/000045.steps.gravity.png)
152177

153-
This is the result of a difference in the de-noising "schedule" - basically the noise has to be cleaned by a certain degree each step or the model won't "converge" on the image properly (see https://huggingface.co/blog/stable_diffusion for more about that). A different step count means a different schedule, which means things get interpreted slightly differently at every step.
178+
This is the result of a difference in the de-noising "schedule" - basically the noise has to be cleaned by a certain degree each step or the model won't "converge" on the image properly (see [stable diffusion blog](https://huggingface.co/blog/stable_diffusion) for more about that). A different step count means a different schedule, which means things get interpreted slightly differently at every step.

0 commit comments

Comments
 (0)