-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Add the option so --n-cpu-moe offloads the first layers from each GPU, or add the possibility to set multiple --n-cpu-moe, each starting on a different layer. Eg:
--n-cpu-moe 0:5 (start at 0 and do 5 layers)
--n-cpu-moe 10:5 (start at 10 and do 5 layters)
Motivation
If I'm not missing something, --n-cpu-moe option only offloads the first layers, but this is not very useful on multi gpu configurations where one has to offload from each GPU (eg. 5 moe layers from each gpu).
Would be nice to be able to configure this without -ot.
Thanks for the awesome work!
Possible Implementation
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request