Hi!, yes, Kontext supports all the diffusers optimizations, you can run it on even 16GB GPUs or even lower depending on your RAM.

I suggest to use group offloading but you will need 64GB of RAM, this code will run on a 16GB GPU without any quality loss:

import torch

from diffusers import FluxKontextPipeline, FluxTransformer2DModel
from diffusers.hooks import apply_group_offloading
from diffusers.utils import load_image


image = load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/resources/dog_plushie.png"
).convert("RGB")

prompt = "Transform this image into an anime-style illustration inspired by Studio Ghibli"

transformer = FluxTransformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev", subfolder="transformer", torch_dtype=torch.bfloat16
)

pipe = FluxKontextPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

apply_group_offloading(
    pipe.transformer,
    offload_type="leaf_level",
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    use_stream=True,
)

pipe.to("cuda")
image = pipe(
    image=image,
    prompt=prompt,
    guidance_scale=2.5,
).images[0]

image.save("kontext_output.png")

You can also use quantization, since this is the same arch as Flux, you can use all the same optimizations of Flux here.

Kontext model loading quantization problem #11951

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions