Skip to content

Commit 1e3b700

Browse files
Merge branch 'dev'
2 parents 28dfbed + f7d1c40 commit 1e3b700

20 files changed

+223
-114
lines changed

README.md

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -93,14 +93,13 @@ CMAKE_ARGS="-DSD_CUDA=ON" pip install stable-diffusion-cpp-python
9393
<details>
9494
<summary>Using HIPBLAS (ROCm)</summary>
9595

96-
This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure you have the ROCm toolkit installed and that you replace the `$GFX_NAME` value with that of your GPU architecture (`gfx1030` for consumer RDNA2 cards for example).
97-
Windows users refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide and troubleshooting tips.
96+
This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure you have the ROCm toolkit installed and that you replace the `$GFX_NAME` value with that of your GPU architecture (`gfx1030` for consumer RDNA2 cards for example).Windows users refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide and troubleshooting tips.
9897

9998
```bash
10099
if command -v rocminfo; then export GFX_NAME=$(rocminfo | awk '/ *Name: +gfx[1-9]/ {print $2; exit}'); else echo "rocminfo missing!"; fi
101100
if [ -z "${GFX_NAME}" ]; then echo "Error: Couldn't detect GPU!"; else echo "Building for GPU: ${GFX_NAME}"; fi
102101

103-
CMAKE_ARGS="-G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=$GFX_NAME -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON" pip install stable-diffusion-cpp-python
102+
CMAKE_ARGS="-G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DGPU_TARGETS=$GFX_NAME -DAMDGPU_TARGETS=$GFX_NAME -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON" pip install stable-diffusion-cpp-python
104103
```
105104

106105
</details>
@@ -147,18 +146,6 @@ CMAKE_ARGS="-DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML
147146

148147
</details>
149148

150-
<!-- Flash Attention -->
151-
<details>
152-
<summary>Using Flash Attention</summary>
153-
154-
Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUDA (CUBLAS) is enabled because the kernel implementation is missing.
155-
156-
```bash
157-
CMAKE_ARGS="-DSD_FLASH_ATTN=ON" pip install stable-diffusion-cpp-python
158-
```
159-
160-
</details>
161-
162149
<!-- OpenBLAS -->
163150
<details>
164151
<summary>Using OpenBLAS</summary>
@@ -250,6 +237,28 @@ _(Note: Don't forget to include `LD_LIBRARY_PATH=/vendor/lib64` in your command
250237
251238
To upgrade and rebuild `stable-diffusion-cpp-python` add `--upgrade --force-reinstall --no-cache-dir` flags to the `pip install` command to ensure the package is rebuilt from source.
252239
240+
### Using Flash Attention
241+
242+
Enabling flash attention for the diffusion model reduces memory usage by varying amounts of MB, e.g.:
243+
244+
- **flux 768x768** ~600mb
245+
- **SD2 768x768** ~1400mb
246+
247+
For most backends, it slows things down, but for cuda it generally speeds it up too.
248+
At the moment, it is only supported for some models and some backends (like `cpu`, `cuda/rocm` and `metal`).
249+
250+
Run by passing `diffusion_flash_attn=True` to the `StableDiffusion` class and watch for:
251+
252+
```log
253+
[INFO] stable-diffusion.cpp:312 - Using flash attention in the diffusion model
254+
```
255+
256+
and the compute buffer shrink in the debug log:
257+
258+
```log
259+
[DEBUG] ggml_extend.hpp:1004 - flux compute buffer size: 650.00 MB(VRAM)
260+
```
261+
253262
## High-level API
254263
255264
The high-level API provides a simple managed interface through the `StableDiffusion` class.

docs/hipBLAS_on_Windows.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,7 @@ set ninja=C:\Program Files\ninja\ninja.exe
4747

4848
## Building stable-diffusion.cpp
4949

50-
The thing different from the regular CPU build is `-DSD_HIPBLAS=ON`,
51-
`-G "Ninja"`, `-DCMAKE_C_COMPILER=clang`, `-DCMAKE_CXX_COMPILER=clang++`, `-DAMDGPU_TARGETS=gfx1100`, `-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON`, `-DCMAKE_POSITION_INDEPENDENT_CODE=ON`
50+
The thing different from the regular CPU build is `-G "Ninja"`, `-DCMAKE_C_COMPILER=clang`, `-DCMAKE_CXX_COMPILER=clang++`, `-DSD_HIPBLAS=ON`, `-DGPU_TARGETS=gfx1100`, `-DAMDGPU_TARGETS=gfx1100`, `-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON`, `-DCMAKE_POSITION_INDEPENDENT_CODE=ON`
5251

5352
Note:
5453
If you encounter an error such as the following:

stable_diffusion_cpp/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@
44

55
# isort: on
66

7-
__version__ = "0.3.4"
7+
__version__ = "0.3.5"

stable_diffusion_cpp/stable_diffusion.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -977,10 +977,7 @@ def _format_control_image(
977977
) -> sd_cpp.sd_image_t:
978978
"""Convert an image path or Pillow Image to an C sd_image_t image."""
979979

980-
if not isinstance(control_image, (str, Image.Image)) or not self.control_net_path:
981-
if not self.control_net_path:
982-
log_event(1, "`control_net_path` not set. Skipping control image")
983-
980+
if not isinstance(control_image, (str, Image.Image)):
984981
# Return an empty sd_image_t
985982
return self._c_uint8_to_sd_image_t_p(
986983
image=None,

stable_diffusion_cpp/stable_diffusion_cpp.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -137,10 +137,6 @@ def byref(obj: CtypesCData, offset: Optional[int] = None) -> CtypesRef[CtypesCDa
137137
byref = ctypes.byref # type: ignore
138138

139139

140-
# from ggml-backend.h
141-
# typedef bool (*ggml_backend_sched_eval_callback)(struct ggml_tensor * t, bool ask, void * user_data);
142-
ggml_backend_sched_eval_callback = ctypes.CFUNCTYPE(ctypes.c_bool, ctypes.c_void_p, ctypes.c_bool, ctypes.c_void_p)
143-
144140
# // Abort callback
145141
# // If not NULL, called before ggml computation
146142
# // If it returns true, the computation is aborted

tests/test_chroma.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,19 +27,18 @@ def test_chroma():
2727
def callback(step: int, steps: int, time: float):
2828
print("Completed step: {} of {}".format(step, steps))
2929

30-
# Generate images
31-
images = stable_diffusion.generate_image(
30+
# Generate image
31+
image = stable_diffusion.generate_image(
3232
prompt=PROMPT,
3333
sample_steps=STEPS,
3434
cfg_scale=CFG_SCALE,
3535
progress_callback=callback,
36-
)
36+
)[0]
3737

38-
# Save images
39-
for i, image in enumerate(images):
40-
pnginfo = PngImagePlugin.PngInfo()
41-
pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()]))
42-
image.save(f"{OUTPUT_DIR}/chroma_{i}.png", pnginfo=pnginfo)
38+
# Save image
39+
pnginfo = PngImagePlugin.PngInfo()
40+
pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()]))
41+
image.save(f"{OUTPUT_DIR}/chroma.png", pnginfo=pnginfo)
4342

4443

4544
# ===========================================

tests/test_controlnet.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,18 @@ def callback(step: int, steps: int, time: float):
2525
print("Completed step: {} of {}".format(step, steps))
2626

2727
for prompt in PROMPTS:
28-
# Generate images
29-
images = stable_diffusion.generate_image(
28+
# Generate image
29+
image = stable_diffusion.generate_image(
3030
prompt=prompt["prompt"],
3131
control_image=INPUT_IMAGE_PATH,
3232
canny=prompt["canny"],
3333
progress_callback=callback,
34-
)
34+
)[0]
3535

36-
# Save images
37-
for i, image in enumerate(images):
38-
pnginfo = PngImagePlugin.PngInfo()
39-
pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()]))
40-
image.save(f"{OUTPUT_DIR}/controlnet{prompt['add']}_{i}.png", pnginfo=pnginfo)
36+
# Save image
37+
pnginfo = PngImagePlugin.PngInfo()
38+
pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()]))
39+
image.save(f"{OUTPUT_DIR}/controlnet{prompt['add']}.png", pnginfo=pnginfo)
4140

4241

4342
# ===========================================

tests/test_convert_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ def test_convert_model():
1111

1212
model_converted = stable_diffusion.convert(
1313
input_path=MODEL_PATH,
14-
output_path=f"{OUTPUT_DIR}/new_model.gguf",
14+
output_path=f"{OUTPUT_DIR}/convert_model.gguf",
1515
output_type="q8_0",
1616
)
1717
print("Model converted: ", model_converted)

tests/test_edit.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -35,21 +35,20 @@ def callback(step: int, steps: int, time: float):
3535
print("Completed step: {} of {}".format(step, steps))
3636

3737
# Edit image
38-
images = stable_diffusion.generate_image(
38+
image = stable_diffusion.generate_image(
3939
prompt=PROMPT,
4040
ref_images=INPUT_IMAGE_PATHS,
4141
sample_steps=STEPS,
4242
cfg_scale=CFG_SCALE,
4343
image_cfg_scale=IMAGE_CFG_SCALE,
4444
sample_method=SAMPLE_METHOD,
4545
progress_callback=callback,
46-
)
46+
)[0]
4747

48-
# Save images
49-
for i, image in enumerate(images):
50-
pnginfo = PngImagePlugin.PngInfo()
51-
pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()]))
52-
image.save(f"{OUTPUT_DIR}/edit_{i}.png", pnginfo=pnginfo)
48+
# Save image
49+
pnginfo = PngImagePlugin.PngInfo()
50+
pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()]))
51+
image.save(f"{OUTPUT_DIR}/edit.png", pnginfo=pnginfo)
5352

5453

5554
# ===========================================

tests/test_flex2.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
from PIL import PngImagePlugin
2+
from conftest import OUTPUT_DIR
3+
4+
from stable_diffusion_cpp import StableDiffusion
5+
6+
DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\flex\\Flex.2-preview-Q8_0.gguf"
7+
T5XXL_PATH = "F:\\stable-diffusion\\flux\\t5xxl_q8_0.gguf"
8+
CLIP_L_PATH = "F:\\stable-diffusion\\flux\\clip_l-q8_0.gguf"
9+
VAE_PATH = "F:\\stable-diffusion\\flux\\ae-f16.gguf"
10+
11+
INPUT_IMAGE_PATH = "assets\\input.png"
12+
PROMPT = "the cat has a hat"
13+
STEPS = 20
14+
15+
16+
def test_flex2():
17+
18+
stable_diffusion = StableDiffusion(
19+
diffusion_model_path=DIFFUSION_MODEL_PATH,
20+
clip_l_path=CLIP_L_PATH,
21+
t5xxl_path=T5XXL_PATH,
22+
vae_path=VAE_PATH,
23+
keep_clip_on_cpu=True,
24+
vae_decode_only=True,
25+
)
26+
27+
def callback(step: int, steps: int, time: float):
28+
print("Completed step: {} of {}".format(step, steps))
29+
30+
# Generate image
31+
image = stable_diffusion.generate_image(
32+
prompt=PROMPT,
33+
control_image=INPUT_IMAGE_PATH,
34+
sample_steps=STEPS,
35+
progress_callback=callback,
36+
)[0]
37+
38+
# Save image
39+
pnginfo = PngImagePlugin.PngInfo()
40+
pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()]))
41+
image.save(f"{OUTPUT_DIR}/flex2.png", pnginfo=pnginfo)
42+
43+
44+
# ===========================================
45+
# C++ CLI
46+
# ===========================================
47+
48+
# import subprocess
49+
50+
# from conftest import SD_CPP_CLI
51+
52+
# stable_diffusion = None # Clear model
53+
54+
55+
# cli_cmd = [
56+
# SD_CPP_CLI,
57+
# "--diffusion-model",
58+
# DIFFUSION_MODEL_PATH,
59+
# "--control-image",
60+
# INPUT_IMAGE_PATH,
61+
# "--vae",
62+
# VAE_PATH,
63+
# "--t5xxl",
64+
# T5XXL_PATH,
65+
# "--clip_l",
66+
# CLIP_L_PATH,
67+
# "--prompt",
68+
# PROMPT,
69+
# "--steps",
70+
# str(STEPS),
71+
# "--clip-on-cpu",
72+
# "--output",
73+
# f"{OUTPUT_DIR}/flex2_cli.png",
74+
# "-v",
75+
# ]
76+
# print(" ".join(cli_cmd))
77+
# subprocess.run(cli_cmd, check=True)

0 commit comments

Comments
 (0)