Compatibility of lora and radix attention. #2141

walker-ai · 2024-11-23T15:14:20Z

walker-ai
Nov 23, 2024

I tried to use LoRA Linear from sglang to implement a feature in my own code which is about inferencing with LoRA, and I have already incorporated the RadixAttn. But I found that the output is always <eos> which means nothing. For example:

# prompt: "who are you"
# answer: ""

Until I saw the

assert (
    self.max_loras_per_batch > 0
    # FIXME
    and (self.lora_paths is None or self.disable_cuda_graph)
    and (self.lora_paths is None or self.disable_radix_cache)
), "compatibility of lora and cuda graph and radix attention is in progress"
assert self.base_gpu_id >= 0, "base_gpu_id must be non-negative"

I'm afraid that the question I met is caused by this? I m not sure about that, could you give me some advice?

@Ying1123 Thx for your time!

Answered by lifuhuang

Aug 2, 2025

Update (8/22): it's already supported in #7216 by @Fridge003 . Please give it a try and let us know if you run into any issues.

Hi, we are currently working on supporting compatibility between RadixCache and LoRA (cc @Fridge003 ). It's not a trivial feature because we essentially needs to keep separate KV cache per LoRA.

You can refer to #2929 for all LoRA-related planned work for H2.

View full answer

Galaxy-Husky · 2024-12-04T04:37:19Z

Galaxy-Husky
Dec 4, 2024

I think you should disable cuda_graph and radix_cache when loading lora adapters like:

sglang/test/srt/models/test_lora.py

Lines 85 to 94 in f8b0326

    
           with SRTRunner( 
        
               base_path, 
        
               torch_dtype=torch_dtype, 
        
               model_type="generation", 
        
               tp_size=tp_size, 
        
               lora_paths=all_lora_paths, 
        
               max_loras_per_batch=3, 
        
               disable_cuda_graph=True, 
        
               disable_radix_cache=True, 
        
           ) as srt_runner:

5 replies

walker-ai Dec 12, 2024
Author

Thank you for your reply! I noticed the update about apply S-LoRA method to adapt multi-lora switch, the code comments says that Not compatible with RadixCache currently. I wanna know if this is fundamentally infeasible, or since the intermediate computation results generated by different LoRA adapters are all different, making it difficult to reuse KV between requests (similar to Radix Cache), is this technically challenging to implement?

Galaxy-Husky Dec 12, 2024

Thank you for your reply! I noticed the update about apply S-LoRA method to adapt multi-lora switch, the code comments says that Not compatible with RadixCache currently. I wanna know if this is fundamentally infeasible, or since the intermediate computation results generated by different LoRA adapters are all different, making it difficult to reuse KV between requests (similar to Radix Cache), is this technically challenging to implement?

Sorry, I'm not a developer and I'm not sure about your question. You might get a faster official response by submitting an issue. Good luck!

walker-ai Dec 12, 2024
Author

thx anyway, I'll ask in SGL slack

Kayce001 Jul 29, 2025

i meet the same problem,does there is a solution?

lifuhuang Aug 22, 2025
Collaborator

@Kayce001 , Radix support has now been merged in #7216 .

lifuhuang · 2025-08-02T20:59:01Z

lifuhuang
Aug 2, 2025
Collaborator

Update (8/22): it's already supported in #7216 by @Fridge003 . Please give it a try and let us know if you run into any issues.

Hi, we are currently working on supporting compatibility between RadixCache and LoRA (cc @Fridge003 ). It's not a trivial feature because we essentially needs to keep separate KV cache per LoRA.

You can refer to #2929 for all LoRA-related planned work for H2.

0 replies

Compatibility of lora and radix attention. #2141

Uh oh!

Uh oh!

walker-ai Nov 23, 2024

Replies: 2 comments · 5 replies

Uh oh!

Galaxy-Husky Dec 4, 2024

Uh oh!

Uh oh!

walker-ai Dec 12, 2024 Author

Uh oh!

Galaxy-Husky Dec 12, 2024

Uh oh!

walker-ai Dec 12, 2024 Author

Uh oh!

Kayce001 Jul 29, 2025

Uh oh!

lifuhuang Aug 22, 2025 Collaborator

Uh oh!

Uh oh!

lifuhuang Aug 2, 2025 Collaborator

walker-ai
Nov 23, 2024

Replies: 2 comments 5 replies

Galaxy-Husky
Dec 4, 2024

walker-ai Dec 12, 2024
Author

walker-ai Dec 12, 2024
Author

lifuhuang Aug 22, 2025
Collaborator

lifuhuang
Aug 2, 2025
Collaborator