Skip to content

Conversation

@shauray8
Copy link
Contributor

@shauray8 shauray8 commented Aug 3, 2024

What does this PR do?

image-to-image support for the FLUX pipeline.

Before submitting

Who can review?

@asomoza @a-r-r-o-w @sayakpaul

@sayakpaul
Copy link
Member

@shauray8 could you also show us examples?

@sayakpaul sayakpaul requested a review from asomoza August 4, 2024 10:15
@deforum-art
Copy link

deforum-art commented Aug 4, 2024

this does not work correctly, its missing the PipelineImageInput and does not add scaled noise to the init

image
image_2

the ordering of steps 4 and 5 need to be altered because correctly noising the latent requires timestep to be defined

@tin2tin tin2tin mentioned this pull request Aug 5, 2024
@shauray8
Copy link
Contributor Author

shauray8 commented Aug 5, 2024

Hey @sayakpaul @deforum, here I'm just trying out things, in theory img2img should work i guess with FLUX family of models, so as soon as I get somewhere I'll post some results.
I'll keep you guys posted :)

@deforum-art
Copy link

I have built a "working" pipeline, however the results are poor/different than what I would expect. I am unsure if it is related to difference in guidance etc..

https://github.com/deforum-studio/flux/blob/main/flux_pipeline.py

@asomoza
Copy link
Member

asomoza commented Aug 5, 2024

what I found is that it needs a lot more strength than other models:

prompt: "high quality photo of a capybara"

original img2img
20240712075143_1949673609 image

@deforum-art
Copy link

i think there is an issue with the noise schedule, what would make this model need more strength?

@asomoza
Copy link
Member

asomoza commented Aug 6, 2024

Probably because it's a distilled model (I'm using dev), I saw people with the same issue and also I see the same results with ComfyUI.

I just did a quick test because I need it for diff-diff so I didn't dig that much into since I'm more interested in how it works with inpainting and because img2img is already taken by @shauray8 and this PR.

@sayakpaul
Copy link
Member

@shauray8 we would like to ship the img2img pipeline soon (preferably next week) because of its demand. Would it be possible for you to provide your commit address so that we can honor your contributions by adding you as a co-author? In that case, we can close this PR.

Upon inactivity, we will close it next week and do a PR. But in any case, please let us know about your commit address. I hope this is okay :)

@SoftologyPro
Copy link

SoftologyPro commented Aug 9, 2024

The img2img is faster in some ways but overall slower than the original "on potato" script.
For example, using the following code for a single image takes 3m10s overall on a 4090 to create a single image. The initial pipeline creation is fast, but the image generation is slower taking most of the time.

pipe = FluxImg2ImgPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.to("cuda")
url = None
init_image = None #load_image(url)
prompt = args2.prompt
image = pipe(prompt, image=init_image, num_inference_steps=4, guidance_scale=3.5).images[0]
image.save('blah.png')

Using the original on potato script is slower to setup the encoders, tokenizers, etc but the image generation takes only seconds.
Overall this still takes around 2m45s to finish.

bfl_repo = "black-forest-labs/FLUX.1-schnell"
scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(bfl_repo, subfolder="scheduler")
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=dtype)
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=dtype)
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
tokenizer_2 = T5TokenizerFast.from_pretrained(bfl_repo, subfolder="tokenizer_2", torch_dtype=dtype)
vae = AutoencoderKL.from_pretrained(bfl_repo, subfolder="vae", torch_dtype=dtype)
transformer = FluxTransformer2DModel.from_pretrained(bfl_repo, subfolder="transformer", torch_dtype=dtype)
quantize(transformer, weights=qfloat8)
freeze(transformer)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)
pipe = FluxPipeline(
    scheduler=scheduler,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    text_encoder_2=None,
    tokenizer_2=tokenizer_2,
    vae=vae,
    transformer=None,
)
pipe.text_encoder_2 = text_encoder_2
pipe.transformer = transformer
generator = torch.Generator().manual_seed(args2.seed)
image = pipe(
    prompt=args2.prompt, 
    width=1024,
    height=1024,
    num_inference_steps=4, 
    generator=generator,
    guidance_scale=3.5,
).images[0]
image.save('blah.png')

If there was a way to combine the speed of the pipeline setup from img2img with the image generation speed of on_potato then Flux performance would be closer to the other recent Text-to-Image systems (for example Playground v2.5 takes 15 seconds total to generate a 1024x1024 image, including loading models and setting up the pipeline). If anyone can speed these up in any way it would be very helpful. I have added support for Flux to Visions of Chaos and the general consensus is "great image quality, but slow"

@shauray8
Copy link
Contributor Author

@shauray8 we would like to ship the img2img pipeline soon (preferably next week) because of its demand. Would it be possible for you to provide your commit address so that we can honor your contributions by adding you as a co-author? In that case, we can close this PR.

Upon inactivity, we will close it next week and do a PR. But in any case, please let us know about your commit address. I hope this is okay :)

sure @sayakpaul, I don't seem to get good results possible due to how I was passing it through the scheduler so yes, also I mean I did not do any contribution but here you go 141bd6bbfa5a3bb096eaa8056e540a3f9e559e2b

@smthemex
Copy link

I have built a "working" pipeline, however the results are poor/different than what I would expect. I am unsure if it is related to difference in guidance etc..

https://github.com/deforum-studio/flux/blob/main/flux_pipeline.py

Use you codes, change strength to 0.7 ~0.6 will get a not bad image...

@yiyixuxu
Copy link
Collaborator

ohhh @shauray8
I'm really sorry I missed this PR!! we merged this one instead even though it comes after your PR #9135

Would it be ok if I close this one now? if you see any improvement you can make on the flux img2img and inpaint pipeline we, Please let us know! we can make a PR and make you an author, or you are welcomed to make a PR too

sorry again

@shauray8
Copy link
Contributor Author

ohhh @shauray8 I'm really sorry I missed this PR!! we merged this one instead even though it comes after your PR #9135

Would it be ok if I close this one now? if you see any improvement you can make on the flux img2img and inpaint pipeline we, Please let us know! we can make a PR and make you an author, or you are welcomed to make a PR too

sorry again

@yiyixuxu no worries, anyway my code wasn't giving off good results, let's see what I can improve on 🫡

@shauray8 shauray8 closed this Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants