-
Notifications
You must be signed in to change notification settings - Fork 348
Open
Description
I found the PR by Awni cool in mlx-lm about batch generation and was experimenting with it over the weekend. I was able to implement it with almost same benchmark numbers on my M5 MacBook Pro with Llama 3.2 3B 4-bit:
| Batch Size | MLX LM (t/s) | MLX Swift (t/s) |
|---|---|---|
| 1 | 61 | 62 |
| 2 | 122 | 118 |
| 32 | 349 | 344 |
There are subtle improvements that I have not been able to find but I think a review would help me out.
Creating this issue in-case if somebody else is already working on it. If not, I can clean up my branch and send a PR!
davidkoski and lin72hronaldmannak, lin72h and petrukha-ivan
Metadata
Metadata
Assignees
Labels
No labels