Skip to content

Commit 04aa2b1

Browse files
authored
[Feat] Improve ModelAdapter reliability with retry and pod switching (vllm-project#1472)
[Feat] Improve ModelAdapter reliability with retry and pod switching mechanisms - Add retry mechanism with exponential backoff (5 retries, 5s intervals) - Add connection error handling for startup scenarios ("connection refused") - Only add pods to instances list after successful adapter loading - Support automatic pod switching when loading fails repeatedly - Maintain backward compatibility with existing deployments Fixes: vllm-project#256, vllm-project#258, vllm-project#363 Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
1 parent 118b7f7 commit 04aa2b1

File tree

13 files changed

+2177
-89
lines changed

13 files changed

+2177
-89
lines changed

development/tutorials/lora/deployment.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ spec:
2222
serviceAccountName: mocked-app-sa
2323
containers:
2424
- name: llm-engine
25-
# TODO: update
2625
image: aibrix/vllm-mock:nightly
2726
ports:
2827
- containerPort: 8000
@@ -51,4 +50,4 @@ spec:
5150
path: /ready
5251
port: 8080
5352
initialDelaySeconds: 5
54-
periodSeconds: 10
53+
periodSeconds: 10

docs/source/features/lora-dynamic-loading.rst

Lines changed: 322 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,53 @@ User can submit a lora `Custom Resource <https://kubernetes.io/docs/concepts/ext
2424
:width: 70%
2525
:align: center
2626

27+
ModelAdapter Lifecycle and Phase Transitions
28+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
29+
30+
The ModelAdapter goes through several distinct phases during its lifecycle. Understanding these phases helps users monitor and troubleshoot their LoRA deployments:
31+
32+
**Phase Transition Flow:**
33+
34+
::
35+
36+
Pending → Scheduled → Loading → Bound → Running
37+
↓ ↓ ↓ ↓ ↓
38+
Starting Pod Adapter Service Ready for
39+
reconcile Selected Loading Created Inference
40+
41+
**Phase Details:**
42+
43+
1. **Pending**: Initial state when the ModelAdapter is first created. The controller starts reconciliation and validates the configuration.
44+
45+
2. **Scheduled**: The controller has successfully identified and selected suitable pods that match the ``podSelector`` criteria. Pods are validated for readiness and stability before selection.
46+
47+
3. **Loading**: The controller is actively loading the LoRA adapter onto the selected pods. This includes:
48+
49+
- Downloading the adapter from the specified ``artifactURL``
50+
- Registering the adapter with the vLLM engine
51+
- Handling retry mechanisms with exponential backoff if loading fails
52+
53+
4. **Bound**: The LoRA adapter has been successfully loaded onto the pods and the controller is creating the associated Kubernetes Service and EndpointSlice resources for service discovery.
54+
55+
5. **Running**: The ModelAdapter is fully operational and ready to serve inference requests. The LoRA model is accessible through the gateway using the adapter name.
56+
57+
**Error Handling and Reliability Features:**
58+
59+
- **Retry Mechanism**: Up to 5 retry attempts per pod with exponential backoff (starting at 5 seconds)
60+
- **Pod Switching**: Automatic selection of alternative pods if loading fails on initial pods
61+
- **Pod Health Validation**: Ensures pods are ready and stable before scheduling adapters
62+
- **Connection Error Handling**: Graceful handling of common startup errors like "connection refused"
63+
64+
**Monitoring Phase Transitions:**
65+
66+
You can monitor the current phase and detailed status using:
67+
68+
.. code-block:: bash
69+
70+
kubectl describe modeladapter <adapter-name>
71+
72+
The status section will show the current phase and transition history with timestamps and reasons for each state change.
73+
2774
Model Adapter Service Discovery
2875
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2976

@@ -81,21 +128,55 @@ Create lora model adapter
81128
.. literalinclude:: ../../../samples/adapter/adapter.yaml
82129
:language: yaml
83130

84-
If you run ```kubectl describe modeladapter qwen-code-lora``, you will see the status of the lora adapter.
131+
If you run ```kubectl describe modeladapter qwen-code-lora``, you will see the status of the lora adapter progressing through different phases:
132+
133+
**Phase 1: Pending**
85134

86135
.. code-block:: bash
87136
88137
$ kubectl describe modeladapter qwen-code-lora
89-
.....
90138
Status:
91139
Conditions:
92140
Last Transition Time: 2025-02-16T19:14:50Z
93141
Message: Starting reconciliation
94142
Reason: ModelAdapterPending
95143
Status: Unknown
96144
Type: Initialized
145+
Phase: Pending
146+
147+
**Phase 2: Scheduled**
148+
149+
.. code-block:: bash
150+
151+
$ kubectl describe modeladapter qwen-code-lora
152+
Status:
153+
Conditions:
154+
Last Transition Time: 2025-02-16T19:14:50Z
155+
Message: Starting reconciliation
156+
Reason: ModelAdapterPending
157+
Status: Unknown
158+
Type: Initialized
159+
Last Transition Time: 2025-02-16T19:14:52Z
160+
Message: ModelAdapter default/qwen-code-lora has selected 1 pods for scheduling: [qwen-coder-1-5b-instruct-5587f4c57d-kml6s]
161+
Reason: Scheduled
162+
Status: True
163+
Type: Scheduled
164+
Phase: Scheduled
165+
166+
**Phase 3-5: Loading → Bound → Running**
167+
168+
.. code-block:: bash
169+
170+
$ kubectl describe modeladapter qwen-code-lora
171+
Status:
172+
Conditions:
97173
Last Transition Time: 2025-02-16T19:14:50Z
98-
Message: ModelAdapter default/qwen-code-lora has been allocated to pod default/qwen-coder-1-5b-instruct-5587f4c57d-kml6s
174+
Message: Starting reconciliation
175+
Reason: ModelAdapterPending
176+
Status: Unknown
177+
Type: Initialized
178+
Last Transition Time: 2025-02-16T19:14:52Z
179+
Message: ModelAdapter default/qwen-code-lora has selected 1 pods for scheduling: [qwen-coder-1-5b-instruct-5587f4c57d-kml6s]
99180
Reason: Scheduled
100181
Status: True
101182
Type: Scheduled
@@ -107,7 +188,6 @@ If you run ```kubectl describe modeladapter qwen-code-lora``, you will see the s
107188
Instances:
108189
qwen-coder-1-5b-instruct-5587f4c57d-kml6s
109190
Phase: Running
110-
Events: <none>
111191
112192
Send request using lora model name to the gateway.
113193

@@ -142,6 +222,244 @@ This ensures that the LoRA adapter is correctly associated with the right pods.
142222
Note: this is only working with vLLM engine. If you use other engine, feel free to open an issue.
143223

144224

225+
Troubleshooting Phase Transitions
226+
----------------------------------
227+
228+
If your ModelAdapter gets stuck in a particular phase, here are common issues and solutions:
229+
230+
**Stuck in Pending Phase:**
231+
232+
- Check if pods matching ``podSelector`` exist and are running
233+
- Verify that pods have the correct labels (``model.aibrix.ai/name`` and ``adapter.model.aibrix.ai/enabled=true``)
234+
- Ensure pods are in Ready state
235+
236+
**Stuck in Scheduled Phase:**
237+
238+
- This was a known issue in earlier versions that has been fixed
239+
- Check controller logs for any loading errors: ``kubectl logs -n aibrix-system deployment/aibrix-controller-manager``
240+
- Verify vLLM pods have ``VLLM_ALLOW_RUNTIME_LORA_UPDATING`` enabled
241+
242+
**Stuck in Loading Phase:**
243+
244+
- Check if the ``artifactURL`` is accessible and valid
245+
- For Hugging Face models, ensure the model path exists
246+
- Check authentication if using private repositories
247+
- Review controller logs for specific loading errors
248+
- The controller will retry up to 5 times with exponential backoff before switching to alternative pods
249+
250+
**Failed to Reach Running Phase:**
251+
252+
- Check if Kubernetes Service creation succeeded: ``kubectl get svc <adapter-name>``
253+
- Verify EndpointSlice creation: ``kubectl get endpointslices``
254+
- Check pod readiness and health status
255+
256+
**Useful Commands for Debugging:**
257+
258+
.. code-block:: bash
259+
260+
# Check ModelAdapter status
261+
kubectl describe modeladapter <adapter-name>
262+
263+
# Check controller logs
264+
kubectl logs -n aibrix-system deployment/aibrix-controller-manager
265+
266+
# Check pod logs
267+
kubectl logs <pod-name>
268+
269+
# Check services and endpoints
270+
kubectl get svc,endpointslices -l model.aibrix.ai/name=<adapter-name>
271+
272+
Reliability Features
273+
--------------------
274+
275+
AIBrix ModelAdapter includes several advanced reliability features to ensure robust LoRA adapter deployments in production environments:
276+
277+
Automatic Retry Mechanism
278+
^^^^^^^^^^^^^^^^^^^^^^^^^^
279+
280+
The controller implements an intelligent retry system for adapter loading:
281+
282+
- **Up to 5 retry attempts** per pod with exponential backoff
283+
- **Base interval of 5 seconds**, doubling on each retry (5s, 10s, 20s, 40s, 80s)
284+
- **Intelligent error detection** for retriable vs. non-retriable errors
285+
- **Graceful handling** of common startup errors like "connection refused"
286+
287+
.. code-block:: yaml
288+
289+
# Retry behavior is automatic - no configuration needed
290+
# Controller tracks retry state via annotations:
291+
# - adapter.model.aibrix.ai/retry-count.<pod-name>
292+
# - adapter.model.aibrix.ai/last-retry-time.<pod-name>
293+
294+
Pod Switching and Failover
295+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
296+
297+
When adapter loading fails on initial pods after maximum retries:
298+
299+
- **Automatic pod selection** of alternative healthy pods
300+
- **Maintains desired replica count** by switching to available pods
301+
- **Preserves scheduler preferences** (least-adapters, default) for new selections
302+
- **Ensures eventual consistency** - adapters will load on healthy pods
303+
304+
Pod Health Validation
305+
^^^^^^^^^^^^^^^^^^^^^^
306+
307+
The controller implements comprehensive pod health checks:
308+
309+
- **Readiness validation**: Pods must be in Ready state before selection
310+
- **Stability requirements**: Pods must be stable (ready for 5+ seconds) to avoid flapping
311+
- **Continuous monitoring**: Unhealthy pods are automatically removed from instances list
312+
- **Startup tolerance**: Handles pod startup delays gracefully
313+
314+
Connection Error Handling
315+
^^^^^^^^^^^^^^^^^^^^^^^^^^
316+
317+
Robust error handling for various network and startup conditions:
318+
319+
.. code-block:: text
320+
321+
Retriable Errors (will trigger retry):
322+
- "connection refused"
323+
- "connection reset"
324+
- "timeout"
325+
- "no route to host"
326+
- "network is unreachable"
327+
- "temporary failure"
328+
- "service unavailable"
329+
- "bad gateway"
330+
331+
Non-Retriable Errors (will fail immediately):
332+
- Authentication errors
333+
- Invalid model format
334+
- Configuration errors
335+
336+
Advanced Configuration Examples
337+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
338+
339+
**High-Availability Setup:**
340+
341+
Here's a production-ready configuration demonstrating reliability features:
342+
343+
.. literalinclude:: ../../../samples/adapter/adapter-reliability-demo.yaml
344+
:language: yaml
345+
346+
**Multi-Replica Distribution:**
347+
348+
For load balancing across multiple pods:
349+
350+
.. literalinclude:: ../../../samples/adapter/adapter-multi-replica.yaml
351+
:language: yaml
352+
:lines: 1-31
353+
354+
**Testing Commands:**
355+
356+
.. code-block:: bash
357+
358+
# 1. Apply the reliability demo adapter:
359+
kubectl apply -f samples/adapter/adapter-reliability-demo.yaml
360+
361+
# 2. Monitor phase transitions:
362+
watch kubectl describe modeladapter reliability-demo-lora
363+
364+
# 3. Check controller logs for retry behavior:
365+
kubectl logs -n aibrix-system deployment/aibrix-controller-manager -f
366+
367+
# 4. Test pod failure scenario (delete a pod to trigger switching):
368+
kubectl delete pod <pod-name>
369+
# Watch how controller handles pod loss and reschedules
370+
371+
# 5. Verify final state:
372+
kubectl get modeladapter reliability-demo-lora -o yaml
373+
374+
Monitoring and Observability
375+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
376+
377+
**Phase Monitoring:**
378+
379+
The reliability features are transparent through the standard ModelAdapter status:
380+
381+
.. code-block:: bash
382+
383+
# Monitor real-time phase transitions
384+
watch kubectl describe modeladapter <adapter-name>
385+
386+
# Check for retry annotations (during loading failures)
387+
kubectl get modeladapter <adapter-name> -o yaml | grep -A 5 annotations
388+
389+
**Controller Logs:**
390+
391+
Monitor retry behavior and pod switching in controller logs:
392+
393+
.. code-block:: bash
394+
395+
# View retry attempts and pod switching
396+
kubectl logs -n aibrix-system deployment/aibrix-controller-manager -f | grep -E "(retry|switching|pod)"
397+
398+
# Example log entries:
399+
# I0830 03:55:42.221143 1 modeladapter_controller.go:557] "Selected pods for adapter scheduling" ModelAdapter="default/my-lora" selectedPods=["pod-1"]
400+
# I0830 03:55:45.331456 1 modeladapter_controller.go:987] "Retriable error loading adapter" pod="pod-1" error="connection refused"
401+
# I0830 03:55:50.445789 1 modeladapter_controller.go:1120] "Successfully loaded adapter on pod" pod="pod-2" ModelAdapter="default/my-lora"
402+
403+
**Events and Conditions:**
404+
405+
Reliability events are recorded in the ModelAdapter status conditions:
406+
407+
.. code-block:: bash
408+
409+
kubectl describe modeladapter <adapter-name>
410+
411+
# Example events showing retry and recovery:
412+
Events:
413+
Type Reason Message
414+
---- ------ -------
415+
Normal Scheduled ModelAdapter has selected 1 pods for scheduling: [pod-1]
416+
Warning MaxRetriesExceeded Max retries exceeded for pod pod-1: connection refused
417+
Normal Scheduled ModelAdapter has selected 1 pods for scheduling: [pod-2]
418+
Normal Loaded Successfully loaded adapter on pod pod-2
419+
420+
Best Practices for Production
421+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
422+
423+
1. **Deploy Multiple Base Model Replicas:**
424+
425+
.. code-block:: yaml
426+
427+
# Ensure multiple pods are available for failover
428+
spec:
429+
replicas: 3 # At least 3 pods for reliability
430+
431+
2. **Use Multiple Adapter Replicas:**
432+
433+
.. code-block:: yaml
434+
435+
# Distribute adapter across multiple pods
436+
spec:
437+
replicas: 2 # Load adapter on 2 different pods
438+
439+
3. **Configure Appropriate Health Checks:**
440+
441+
.. code-block:: yaml
442+
443+
# In base model deployment
444+
livenessProbe:
445+
httpGet:
446+
path: /health
447+
port: 8000
448+
initialDelaySeconds: 60
449+
periodSeconds: 30
450+
failureThreshold: 3
451+
452+
4. **Monitor Adapter Health:**
453+
454+
.. code-block:: bash
455+
456+
# Regular health monitoring
457+
kubectl get modeladapters -o wide
458+
kubectl describe modeladapter <name>
459+
460+
# Set up alerts on phase transitions
461+
kubectl get events --field-selector involvedObject.kind=ModelAdapter
462+
145463
More configurations
146464
-------------------
147465

pkg/constants/model.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,8 @@ const (
3535
// ModelLabelPort is the label for specifying the service port
3636
// Example: "model.aibrix.ai/port": "8080"
3737
ModelLabelPort = "model.aibrix.ai/port"
38+
39+
// ModelLabelAdapterEnabled is the label for enabling or disabling adapter dynamic registration
40+
// Example: "adapter.model.aibrix.ai/enabled": "true"
41+
ModelLabelAdapterEnabled = "adapter.model.aibrix.ai/enabled"
3842
)

0 commit comments

Comments
 (0)