Skip to content

Exploring Connections Between Multi-Token Attention and FLARE #51

@vpuri3

Description

@vpuri3

Hi @Golovneva, @tianlu-wang, @jaseweston

I hope you’re doing well! My name is Vedant, and I recently co-authored a paper titled FLARE: Fast Low-Rank Attention Routing Engine, which we just released on arXiv (https://arxiv.org/abs/2508.12594). FLARE reduces the quadratic cost of self-attention by routing through a low-dimensional latent sequence, essentially inducing a pooling effect in token space.

I really enjoyed reading your recent paper on Multi-Token Attention (MTA) (https://arxiv.org/pdf/2504.00927). I found the idea of directly modeling multi-token interactions through score convolutions to be a powerful and elegant alternative to pairwise attention. It seems that FLARE, MTA, and other directions such as 2-Simplicial Attention (https://arxiv.org/pdf/2507.02754) all push toward richer interaction structures in attention — but each via very different mechanisms (latent projection vs. score convolutions vs. 3-tuple attention).

Given these similarities, I’d love to ask for your advice on how best to evaluate and compare these models:

  • Would you recommend starting with synthetic/algorithmic tasks (e.g., sequence matching, algorithmic reasoning), or directly benchmarking on real domains such as language, point clouds, or physics simulations?

  • What experimental setups or benchmarks would you suggest for a fair and meaningful comparison between MTA, FLARE, and other efficient attention variants?

Our codebase for FLARE is available at https://github.com/vpuri3/FLARE.py, and we’d be happy to help set up comparisons. We’d love to collaborate if you’re interested — even something lightweight like a joint benchmark study. Any guidance on baselines or evaluation protocols you’ve found useful would be greatly appreciated.

If you’re open to it, I’d be thrilled to continue the conversation over email or a quick call. Thanks again for your excellent work — I’m looking forward to your thoughts!

Best regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions