Skip to content

First, I created a sparse attention module for using Flux 1.D which I tested. Some thoughts, questions... #107

@ukaprch

Description

@ukaprch

Using the Wan model as a starting point, I created a Flux 1.D version. I am using sage attention normally and wanted to see what the speed up or differences might be using SpargeAttn. I am using the current github diffusers library for testing.

  1. I did not notice any speed boost using sparge vs sage. I compile both and run as normal using pre-built wheels from https://github.com/woct0rdho/SpargeAttn/releases and https://github.com/woct0rdho/SageAttention/releases while using their Triton wheels as well for Windows 10 OS. Question is does sparge only matter for very long prompts (tokens) as I only tested with a short prompt: a cat holding a sign that says "Hello World"
    I used topk: SpargeAttn API spas_sage2_attn_meansim_topk_cuda for my test with a value of 0.5
    I also noted that SpargeAttn was not adhering to the Text portion of my prompt as well as sage attention.
    I applied SpargeAttn to Flux Transformer for both:
    transformer_blocks (self-attention)
    single_transformer_blocks (cross-attention)
    Should I omit cross-attention for Sparge??

  2. It is unclear what values you would recommend for mode = "cdfthreshd". I see 90% - 95% recommendation on a query search (i.e. .90; .95). Is this correct?

  3. Are you folks even considering a version for Flux 1.D?

Thanks for your wonderful project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions