LoRA support in vllm and hf backends by vicky-xef · Pull Request #56 · genlm/genlm-backend

vicky-xef · 2025-12-08T12:03:05Z

By default, there is no requirement to load LoRA adapter.

In vllm.py file:

Added lora_request attribute (initialized to None, meaning LoRA is not used). This value is passed to every engine.generate(), engine.add_request() call.
Added a add_new_lora() method that hashes a lora name to an id (kept in the lora_name_to_ids dictionary). Both lora name and id should be used in vllm.
Added set_lora() method that loads a LoRA adapter by updating the lora_request attribute.
Added clear_lora() method that resets lora_request to None (disabling LoRA usage).
The user should set enable_lora=True in the engine_opts.

In hf.py file:

Added a add_new_lora() method that loads a LoRA adapter.
Added a set_lora() method that activates the loaded LoRA adapter.
Added a clear_lora() method that deactivates the LoRA adapter.

…bfloat16)

codecov · 2025-12-08T12:06:24Z

Codecov Report

❌ Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
genlm/backend/llm/vllm.py	93.75%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

genlm/backend/llm/hf.py

pyproject.toml

benlebrun · 2025-12-15T17:21:35Z

genlm/backend/llm/vllm.py

+            """
+            self.lora_request = None
+
+        def set_lora(self, lora_path, lora_name="current_lora", lora_id=1):


Is there a reason why the method signature is different between VLLM and HF? That should be avoided since both of these classes are supposed to implement the same interface.

At HF there are two methods: load_lora loads LoRA weights and set_lora actually activates it (someone may need to load two different LoRAs, activate the first one, then deactivate it and activate the other one after etc). At vllm, we only need to pass the LoRA adapter in the vllm request, that's why we only need to have the set_lora method. Should I name them differently?

To make the method signatures match across vLLM and HF backends, a potential solution would be to add a def load_lora(self, lora_path, lora_name='...') method in vllm (that is, with the same signature as in the hf backend), here in vllm this method would simply associate the passed lora_path value to the passed lora_name key. That way you could then def set_lora(self, lora_name='...') just like in the hf backend. It would not need to take in the other args, and would simply use the path associated with the lora_name, provided you 'loaded' it already.
Would something like that make sense, @vicky-xef ?

I renamed load_lora to add_new_lora, added it to vllm, where it generates a hashed id for the lora adapter (both id, lora name and lora path are necessary in vllm request). So, now both vllm and hf have a common interface.

benlebrun

Looks awesome @vicky-xef !

vicky-xef added 13 commits November 20, 2025 13:46

Add lora in vllm & some tests

f5c4ed4

add batched method in async + more tests

67b9651

decrease difference error for lora because of precision issues (e.g. …

04b6b70

…bfloat16)

set lora_request as class attribute

bec6fa5

change hf backend to support lora + add testing

b129c83

clean hf lora tests

78b2039

add testing for swapping lora and no-lora

450cd2a

remove unnecessary import

e2a6a81

remove double batch method

3416104

add comments in the new methods

2a5f93e

remove comment

245743a

add more tests

ab0860f

update dependencies

f357d0f

vicky-xef requested a review from benlebrun December 8, 2025 12:03

cleaning

31334ff

vicky-xef marked this pull request as draft December 8, 2025 12:59

vicky-xef added 13 commits December 9, 2025 12:04

change model for testing

b345d53

add lora dependencies in pytest

809bf82

fix dependencies lora

c00c8d7

change test model

eae7c8a

fix lora test on transformer

797c8d6

increase gpu memory util

998af61

decrease gpu memory util

18e114e

check gpu github

daddc42

change gpu memory util

a951037

debug github

bff9a75

decrease tests

42f0402

downgrade triton

51f2e08

trition 3.2

ca8bd0f

vicky-xef added 11 commits December 12, 2025 11:14

debug models github

ec5ceb1

change model on tests

d766049

remove test for cache reasons

5568324

free disk space

73955d2

triton

bda3699

add testing for error path

709d279

add readme

cc0ba95

cleaning

c52faf2

triton reinstall

565f87f

triton 3.2

bc151d5

remove unnecessary reinstall

24c8dee

vicky-xef marked this pull request as ready for review December 15, 2025 10:25

benlebrun requested changes Dec 15, 2025

View reviewed changes

vicky-xef and others added 13 commits December 16, 2025 11:13

adding lora methods in base class

15c0154

no cover

9654250

change triton version

854c1c1

trition uninstall and install

721b6e2

rm triton caches

814ab86

dependencies explicitly

ee54495

dependencies explicitly

675cef5

triton reinstall

dfe2bb8

rename methods, add add_new_lora to vllm

a870bd2

deps fix

3fcf25f

Merge branch 'main' into vicky/lora

a49d726

attributes fix

6c689d0

fix tests

2c853ef

vicky-xef requested a review from benlebrun January 22, 2026 12:16

benlebrun approved these changes Jan 30, 2026

View reviewed changes

vicky-xef merged commit c305a35 into main Jan 30, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA support in vllm and hf backends#56

LoRA support in vllm and hf backends#56
vicky-xef merged 51 commits intomainfrom
vicky/lora

vicky-xef commented Dec 8, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

benlebrun Dec 15, 2025

Uh oh!

vicky-xef Dec 16, 2025

Uh oh!

postylem Dec 17, 2025 •

edited

Loading

Uh oh!

vicky-xef Jan 12, 2026

Uh oh!

benlebrun left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

vicky-xef commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

benlebrun Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

vicky-xef Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

postylem Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vicky-xef Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

benlebrun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

vicky-xef commented Dec 8, 2025 •

edited

Loading

codecov bot commented Dec 8, 2025 •

edited

Loading

postylem Dec 17, 2025 •

edited

Loading