split logsumexp #2156

awni · 2025-05-07T00:16:54Z

Not sure it's worth merging this. The standalone benchmark is much improved but it's a very modest gain even for a very small model.

Adds a split logsumexp dispatch for when we get a really long vector.

x = mx.random.uniform(shape=(1, 4096 * 50))

def fun(x):
    for _ in range(100):
        x = x - mx.logsumexp(x, axis=-1, keepdims=True)
    return x

Pre: 234 ms
Post: 138 ms

Inference speed improved slightly for small Gemmas (which have a large vocab):

mlx_lm.generate --model mlx-community/gemma-3-1b-it-4bit --prompt "Write a story about Einstein" -m 512

Pre: 333.652 tokens-per-sec
Post: 334.837 tokens-per-sec

split logsumexp

7c99acb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

split logsumexp #2156

split logsumexp #2156

Uh oh!

awni commented May 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

split logsumexp #2156

Are you sure you want to change the base?

split logsumexp #2156

Uh oh!

Conversation

awni commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awni commented May 7, 2025 •

edited

Loading