Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
genlm/backend/llm/vllm.py
Outdated
| """ | ||
| self.lora_request = None | ||
|
|
||
| def set_lora(self, lora_path, lora_name="current_lora", lora_id=1): |
There was a problem hiding this comment.
Is there a reason why the method signature is different between VLLM and HF? That should be avoided since both of these classes are supposed to implement the same interface.
There was a problem hiding this comment.
At HF there are two methods: load_lora loads LoRA weights and set_lora actually activates it (someone may need to load two different LoRAs, activate the first one, then deactivate it and activate the other one after etc). At vllm, we only need to pass the LoRA adapter in the vllm request, that's why we only need to have the set_lora method. Should I name them differently?
There was a problem hiding this comment.
To make the method signatures match across vLLM and HF backends, a potential solution would be to add a def load_lora(self, lora_path, lora_name='...') method in vllm (that is, with the same signature as in the hf backend), here in vllm this method would simply associate the passed lora_path value to the passed lora_name key. That way you could then def set_lora(self, lora_name='...') just like in the hf backend. It would not need to take in the other args, and would simply use the path associated with the lora_name, provided you 'loaded' it already.
Would something like that make sense, @vicky-xef ?
There was a problem hiding this comment.
I renamed load_lora to add_new_lora, added it to vllm, where it generates a hashed id for the lora adapter (both id, lora name and lora path are necessary in vllm request). So, now both vllm and hf have a common interface.
By default, there is no requirement to load LoRA adapter.
In
vllm.pyfile:lora_requestattribute (initialized toNone, meaning LoRA is not used). This value is passed to everyengine.generate(),engine.add_request()call.add_new_lora()method that hashes a lora name to an id (kept in the lora_name_to_ids dictionary). Both lora name and id should be used in vllm.set_lora()method that loads a LoRA adapter by updating thelora_requestattribute.clear_lora()method that resetslora_requesttoNone(disabling LoRA usage).enable_lora=Truein the engine_opts.In
hf.pyfile:add_new_lora()method that loads a LoRA adapter.set_lora()method that activates the loaded LoRA adapter.clear_lora()method that deactivates the LoRA adapter.