-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Using the Wan model as a starting point, I created a Flux 1.D version. I am using sage attention normally and wanted to see what the speed up or differences might be using SpargeAttn. I am using the current github diffusers library for testing.
-
I did not notice any speed boost using sparge vs sage. I compile both and run as normal using pre-built wheels from https://github.com/woct0rdho/SpargeAttn/releases and https://github.com/woct0rdho/SageAttention/releases while using their Triton wheels as well for Windows 10 OS. Question is does sparge only matter for very long prompts (tokens) as I only tested with a short prompt: a cat holding a sign that says "Hello World"
I used topk: SpargeAttn API spas_sage2_attn_meansim_topk_cuda for my test with a value of 0.5
I also noted that SpargeAttn was not adhering to the Text portion of my prompt as well as sage attention.
I applied SpargeAttn to Flux Transformer for both:
transformer_blocks (self-attention)
single_transformer_blocks (cross-attention)
Should I omit cross-attention for Sparge?? -
It is unclear what values you would recommend for mode = "cdfthreshd". I see 90% - 95% recommendation on a query search (i.e. .90; .95). Is this correct?
-
Are you folks even considering a version for Flux 1.D?
Thanks for your wonderful project.