Skip to content

Commit 3f52738

Browse files
authored
[Doc] Add max_lora_rank configuration guide (#22782)
Signed-off-by: chiliu <[email protected]>
1 parent a01e001 commit 3f52738

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

docs/features/lora.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -351,3 +351,22 @@ vllm serve ibm-granite/granite-speech-3.3-2b \
351351
```
352352

353353
Note: Default multimodal LoRAs are currently only available for `.generate` and chat completions.
354+
355+
## Using Tips
356+
357+
### Configuring `max_lora_rank`
358+
359+
The `--max-lora-rank` parameter controls the maximum rank allowed for LoRA adapters. This setting affects memory allocation and performance:
360+
361+
- **Set it to the maximum rank** among all LoRA adapters you plan to use
362+
- **Avoid setting it too high** - using a value much larger than needed wastes memory and can cause performance issues
363+
364+
For example, if your LoRA adapters have ranks [16, 32, 64], use `--max-lora-rank 64` rather than 256
365+
366+
```bash
367+
# Good: matches actual maximum rank
368+
vllm serve model --enable-lora --max-lora-rank 64
369+
370+
# Bad: unnecessarily high, wastes memory
371+
vllm serve model --enable-lora --max-lora-rank 256
372+
```

0 commit comments

Comments
 (0)