Skip to content

Support for Batch Generation #437

@rudrankriyam

Description

@rudrankriyam

I found the PR by Awni cool in mlx-lm about batch generation and was experimenting with it over the weekend. I was able to implement it with almost same benchmark numbers on my M5 MacBook Pro with Llama 3.2 3B 4-bit:

Batch Size MLX LM (t/s) MLX Swift (t/s)
1 61 62
2 122 118
32 349 344

There are subtle improvements that I have not been able to find but I think a review would help me out.

Creating this issue in-case if somebody else is already working on it. If not, I can clean up my branch and send a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions