Skip to content

Commit 3e2d98b

Browse files
feat: chroma and kontext support
1 parent fccbde7 commit 3e2d98b

File tree

10 files changed

+461
-63
lines changed

10 files changed

+461
-63
lines changed

README.md

Lines changed: 89 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ CMAKE_ARGS="-G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_
106106
<details>
107107
<summary>Using Metal</summary>
108108

109-
Using Metal makes the computation run on the GPU. Currently, there are some issues with Metal when performing operations on very large matrices, making it highly inefficient at the moment. Performance improvements are expected in the near future.
109+
Using Metal runs the computation on Apple Silicon. Currently, there are some issues with Metal when performing operations on very large matrices, making it highly inefficient. Performance improvements are expected in the near future.
110110

111111
```bash
112112
CMAKE_ARGS="-DSD_METAL=ON" pip install stable-diffusion-cpp-python
@@ -129,7 +129,7 @@ CMAKE_ARGS="-DSD_VULKAN=ON" pip install stable-diffusion-cpp-python
129129
<details>
130130
<summary>Using SYCL</summary>
131131

132-
Using SYCL makes the computation run on the Intel GPU. Please make sure you have installed the related driver and [Intel® oneAPI Base toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html) before start. More details and steps can refer to [llama.cpp SYCL backend](https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md#linux).
132+
Using SYCL runs the computation on an Intel GPU. Please make sure you have installed the related driver and [Intel® oneAPI Base toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html) before starting. For more details refer to [llama.cpp SYCL backend](https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md#linux).
133133

134134
```bash
135135
# Export relevant ENV variables
@@ -167,7 +167,6 @@ CMAKE_ARGS="-DGGML_OPENBLAS=ON" pip install stable-diffusion-cpp-python
167167
</details>
168168

169169
<!-- MUSA -->
170-
171170
<details>
172171
<summary>Using MUSA</summary>
173172

@@ -189,7 +188,7 @@ The high-level API provides a simple managed interface through the `StableDiffus
189188

190189
Below is a short example demonstrating how to use the high-level API to generate a simple image:
191190

192-
### Text to Image
191+
### <u>Text to Image</u>
193192

194193
```python
195194
from stable_diffusion_cpp import StableDiffusion
@@ -211,7 +210,7 @@ output = stable_diffusion.txt_to_img(
211210
output[0].save("output.png") # Output returned as list of PIL Images
212211
```
213212

214-
#### With LoRA (Stable Diffusion)
213+
#### <u>With LoRA (Stable Diffusion)</u>
215214

216215
You can specify the directory where the lora weights are stored via `lora_model_dir`. If not specified, the default is the current working directory.
217216

@@ -234,9 +233,11 @@ output = stable_diffusion.txt_to_img(
234233

235234
- The `lora_model_dir` argument is used in the same way for FLUX image generation.
236235

237-
### FLUX Image Generation
236+
---
237+
238+
### <u>FLUX Image Generation</u>
238239

239-
FLUX models should be run using the same implementation as the [stable-diffusion.cpp FLUX documentation](https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/flux.md) where the `diffusion_model_path` argument is used in place of the `model_path`. The `clip_l_path`, `t5xxl_path`, and `vae_path` arguments are also required for inference to function.
240+
FLUX models should be run using the same implementation as the [stable-diffusion.cpp FLUX documentation](https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/flux.md) where the `diffusion_model_path` argument is used in place of the `model_path`. The `clip_l_path`, `t5xxl_path`, and `vae_path` arguments are also required for inference to function (for most models).
240241

241242
Download the weights from the links below:
242243

@@ -263,24 +264,77 @@ output = stable_diffusion.txt_to_img(
263264
)
264265
```
265266

266-
#### With LoRA (FLUX)
267+
#### <u>With LoRA (FLUX)</u>
267268

268269
LoRAs can be used with FLUX models in the same way as Stable Diffusion models ([as shown above](#with-lora-stable-diffusion)).
269270

270271
Note that:
271272

272273
- It is recommended you use LoRAs with naming formats compatible with ComfyUI.
273-
- LoRAs will only work with Flux-dev q8_0.
274+
- LoRAs will only work with `Flux-dev q8_0`.
274275
- You can download FLUX LoRA models from https://huggingface.co/XLabs-AI/flux-lora-collection/tree/main (you must use a comfy converted version!!!).
275276

276-
### SD3.5 Image Generation
277+
#### <u>Kontext (FLUX)</u>
277278

278279
Download the weights from the links below:
279280

280-
- Download sd3.5_large from https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/sd3.5_large.safetensors
281-
- Download clip_g from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_g.safetensors
282-
- Download clip_l from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_l.safetensors
283-
- Download t5xxl from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/t5xxl_fp16.safetensors
281+
- Preconverted gguf model from [FLUX.1-Kontext-dev-GGUF](https://huggingface.co/QuantStack/FLUX.1-Kontext-dev-GGUF)
282+
- Otherwise, download FLUX.1-Kontext-dev from [black-forest-labs/FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/blob/main/flux1-kontext-dev.safetensors)
283+
- The `vae`, `clip_l`, and `t5xxl` models are the same as for FLUX image generation linked above.
284+
285+
```python
286+
from stable_diffusion_cpp import StableDiffusion
287+
288+
stable_diffusion = StableDiffusion(
289+
diffusion_model_path="../models/flux1-kontext-dev-Q5_K_S.gguf", # In place of model_path
290+
clip_l_path="../models/clip_l.safetensors",
291+
t5xxl_path="../models/t5xxl_fp16.safetensors",
292+
vae_path="../models/ae.safetensors",
293+
vae_decode_only=False, # Must be False for FLUX Kontext
294+
)
295+
output = stable_diffusion.edit(
296+
prompt="make the cat blue",
297+
images=["input.png"],
298+
cfg_scale=1.0, # a cfg_scale of 1 is recommended for FLUX
299+
sample_method="euler", # euler is recommended for FLUX
300+
)
301+
```
302+
303+
#### <u>Chroma (FLUX)</u>
304+
305+
Download the weights from the links below:
306+
307+
- Preconverted gguf model from [silveroxides/Chroma-GGUF](https://huggingface.co/silveroxides/Chroma-GGUF)
308+
- Otherwise, download chroma's safetensors from [lodestones/Chroma](https://huggingface.co/lodestones/Chroma)
309+
- The `vae` and `t5xxl` models are the same as for FLUX image generation linked above (`clip_l` not required).
310+
311+
```python
312+
from stable_diffusion_cpp import StableDiffusion
313+
314+
stable_diffusion = StableDiffusion(
315+
diffusion_model_path="../models/chroma-unlocked-v40-Q4_0.gguf", # In place of model_path
316+
t5xxl_path="../models/t5xxl_fp16.safetensors",
317+
vae_path="../models/ae.safetensors",
318+
vae_decode_only=True, # Can be True if we dont use img_to_img
319+
)
320+
output = stable_diffusion.txt_to_img(
321+
prompt="a lovely cat holding a sign says 'chroma.cpp'",
322+
sample_steps=4,
323+
cfg_scale=4.0, # a cfg_scale of 4 is recommended for Chroma
324+
sample_method="euler", # euler is recommended for FLUX
325+
)
326+
```
327+
328+
---
329+
330+
### <u>SD3.5 Image Generation</u>
331+
332+
Download the weights from the links below:
333+
334+
- Download `sd3.5_large` from https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/sd3.5_large.safetensors
335+
- Download `clip_g` from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_g.safetensors
336+
- Download `clip_l` from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_l.safetensors
337+
- Download `t5xxl` from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/t5xxl_fp16.safetensors
284338

285339
```python
286340
from stable_diffusion_cpp import StableDiffusion
@@ -300,10 +354,13 @@ output = stable_diffusion.txt_to_img(
300354
)
301355
```
302356

303-
### Image to Image
357+
---
358+
359+
### <u>Image to Image</u>
304360

305361
```python
306362
from stable_diffusion_cpp import StableDiffusion
363+
# from PIL import Image
307364

308365
INPUT_IMAGE = "../input.png"
309366
# INPUT_IMAGE = Image.open("../input.png") # or alternatively, pass as PIL Image
@@ -317,7 +374,7 @@ output = stable_diffusion.img_to_img(
317374
)
318375
```
319376

320-
### Inpainting
377+
### <u>Inpainting</u>
321378

322379
```python
323380
from stable_diffusion_cpp import StableDiffusion
@@ -333,7 +390,9 @@ output = stable_diffusion.img_to_img(
333390
)
334391
```
335392

336-
### PhotoMaker
393+
---
394+
395+
### <u>PhotoMaker</u>
337396

338397
You can use [PhotoMaker](https://github.com/TencentARC/PhotoMaker) to personalize generated images with your own ID.
339398

@@ -366,27 +425,27 @@ output = stable_diffusion.txt_to_img(
366425
)
367426
```
368427

369-
### PhotoMaker Version 2
428+
#### <u>PhotoMaker Version 2</u>
370429

371-
[PhotoMaker Version 2 (PMV2)](https://github.com/TencentARC/PhotoMaker/blob/main/README_pmv2.md) has some key improvements. Unfortunately it has a very heavy dependency which makes running it a bit involved in `SD.cpp`.
430+
[PhotoMaker Version 2 (PMV2)](https://github.com/TencentARC/PhotoMaker/blob/main/README_pmv2.md) has some key improvements. Unfortunately it has a very heavy dependency which makes running it a bit involved.
372431

373-
Running PMV2 Requires running a python script `face_detect.py` (found [here](https://github.com/leejet/stable-diffusion.cpp/blob/master/face_detect.py)) to obtain **id_embeds** for the given input images.
432+
Running PMV2 Requires running a python script `face_detect.py` (found here [stable-diffusion.cpp/face_detect.py](https://github.com/leejet/stable-diffusion.cpp/blob/master/face_detect.py)) to obtain `id_embeds` for the given input images.
374433

375-
```
434+
```bash
376435
python face_detect.py <input_image_dir>
377436
```
378437

379438
An `id_embeds.safetensors` file will be generated in `input_images_dir`.
380439

381-
**Note: this step is only needed to run once; the same `id_embeds` can be reused**
440+
**Note: This step only needs to be run oncethe resulting `id_embeds` can be reused.**
382441

383442
- Run the same command as in version 1 but replacing `photomaker-v1.safetensors` with `photomaker-v2.safetensors`.
443+
Download `photomaker-v2.safetensors` from [bssrdf/PhotoMakerV2](https://huggingface.co/bssrdf/PhotoMakerV2).
444+
- All other parameters from Version 1 remain the same for Version 2.
384445

385-
You can download `photomaker-v2.safetensors` from [here](https://huggingface.co/bssrdf/PhotoMakerV2).
386-
387-
- All the other parameters from Version 1 remain the same for Version 2.
446+
---
388447

389-
### Listing GGML model and RNG types, schedulers and sample methods
448+
### <u>Listing GGML model and RNG types, schedulers and sample methods</u>
390449

391450
Access the GGML model and RNG types, schedulers, and sample methods via the following maps:
392451

@@ -399,7 +458,9 @@ print("Schedulers:", list(SCHEDULE_MAP))
399458
print("Sample methods:", list(SAMPLE_METHOD_MAP))
400459
```
401460

402-
### Other High-level API Examples
461+
---
462+
463+
### <u>Other High-level API Examples</u>
403464

404465
Other examples for the high-level API (such as upscaling and model conversion) can be found in the [tests](tests) directory.
405466

@@ -408,7 +469,7 @@ Other examples for the high-level API (such as upscaling and model conversion) c
408469
The low-level API is a direct [`ctypes`](https://docs.python.org/3/library/ctypes.html) binding to the C API provided by `stable-diffusion.cpp`.
409470
The entire low-level API can be found in [stable_diffusion_cpp/stable_diffusion_cpp.py](https://github.com/william-murray1204/stable-diffusion-cpp-python/blob/main/stable_diffusion_cpp/stable_diffusion_cpp.py) and directly mirrors the C API in [stable-diffusion.h](https://github.com/leejet/stable-diffusion.cpp/blob/master/stable-diffusion.h).
410471

411-
Below is a short example demonstrating how to use the low-level API:
472+
Below is a short example demonstrating low-level API usage:
412473

413474
```python
414475
import stable_diffusion_cpp as sd_cpp
@@ -427,12 +488,6 @@ c_image = sd_cpp.sd_image_t(
427488
ctypes.POINTER(ctypes.c_uint8),
428489
),
429490
) # Create a new C sd_image_t
430-
431-
img = sd_cpp.upscale(
432-
self.upscaler,
433-
image_bytes,
434-
upscale_factor,
435-
) # Upscale the image
436491
```
437492

438493
## Development

assets/box.png

184 KB
Loading

stable_diffusion_cpp/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@
44

55
# isort: on
66

7-
__version__ = "0.2.7"
7+
__version__ = "0.2.8"

stable_diffusion_cpp/_internals.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@ def __init__(
3939
keep_control_net_cpu: bool,
4040
keep_vae_on_cpu: bool,
4141
diffusion_flash_attn: bool,
42+
chroma_use_dit_mask: bool,
43+
chroma_use_t5_mask: bool,
44+
chroma_t5_mask_pad: int,
4245
verbose: bool,
4346
):
4447
self.model_path = model_path
@@ -62,6 +65,9 @@ def __init__(
6265
self.keep_control_net_cpu = keep_control_net_cpu
6366
self.keep_vae_on_cpu = keep_vae_on_cpu
6467
self.diffusion_flash_attn = diffusion_flash_attn
68+
self.chroma_use_dit_mask = chroma_use_dit_mask
69+
self.chroma_use_t5_mask = chroma_use_t5_mask
70+
self.chroma_t5_mask_pad = chroma_t5_mask_pad
6571
self.verbose = verbose
6672

6773
self._exit_stack = ExitStack()
@@ -104,8 +110,11 @@ def __init__(
104110
self.schedule,
105111
self.keep_clip_on_cpu,
106112
self.keep_control_net_cpu,
107-
self.diffusion_flash_attn,
108113
self.keep_vae_on_cpu,
114+
self.diffusion_flash_attn,
115+
self.chroma_use_dit_mask,
116+
self.chroma_use_t5_mask,
117+
self.chroma_t5_mask_pad,
109118
)
110119

111120
# Check if the model was loaded successfully

0 commit comments

Comments
 (0)