-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Labels
bugSomething isn't workingSomething isn't working
Description
For hf models, setting the adapter causes generation requests to the base model to be modified and sets requires_grad=True. We need to:
- unset the adapter after calling the alora
- disable requires_grad (I believe)
- implement a lock for huggingface aloras so that async non-alora generation requests don't accidentally use the alora
- have aloras use model_options?
We should also make sure that the VLLM server doesn't have similar issues with the active adapter needing to be set/unset or managed between requests.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working