You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In addition to serving LoRA adapters at server startup, the vLLM server now supports dynamically loading and unloading
110
-
LoRA adapters at runtime through dedicated API endpoints. This feature can be particularly useful when the flexibility
111
-
to change models on-the-fly is needed.
109
+
In addition to serving LoRA adapters at server startup, the vLLM server supports dynamically configuring LoRA adapters at runtime through dedicated API endpoints and plugins. This feature can be particularly useful when the flexibility to change models on-the-fly is needed.
112
110
113
111
Note: Enabling this feature in production environments is risky as users may participate in model adapter management.
114
112
115
-
To enable dynamic LoRA loading and unloading, ensure that the environment variable `VLLM_ALLOW_RUNTIME_LORA_UPDATING`
116
-
is set to `True`. When this option is enabled, the API server will log a warning to indicate that dynamic loading is active.
113
+
To enable dynamic LoRA configuration, ensure that the environment variable `VLLM_ALLOW_RUNTIME_LORA_UPDATING`
114
+
is set to `True`.
117
115
118
116
```bash
119
117
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
120
118
```
121
119
120
+
### Using API Endpoints
122
121
Loading a LoRA Adapter:
123
122
124
123
To dynamically load a LoRA adapter, send a POST request to the `/v1/load_lora_adapter` endpoint with the necessary
@@ -153,6 +152,58 @@ curl -X POST http://localhost:8000/v1/unload_lora_adapter \
153
152
}'
154
153
```
155
154
155
+
### Using Plugins
156
+
Alternatively, you can use the LoRAResolver plugin to dynamically load LoRA adapters. LoRAResolver plugins enable you to load LoRA adapters from both local and remote sources such as local file system and S3. On every request, when there's a new model name that hasn't been loaded yet, the LoRAResolver will try to resolve and load the corresponding LoRA adapter.
157
+
158
+
You can set up multiple LoRAResolver plugins if you want to load LoRA adapters from different sources. For example, you might have one resolver for local files and another for S3 storage. vLLM will load the first LoRA adapter that it finds.
159
+
160
+
You can either install existing plugins or implement your own.
161
+
162
+
Steps to implement your own LoRAResolver plugin:
163
+
1. Implement the LoRAResolver interface.
164
+
165
+
Example of a simple S3 LoRAResolver implementation:
0 commit comments