@@ -24,6 +24,53 @@ User can submit a lora `Custom Resource <https://kubernetes.io/docs/concepts/ext
2424 :width: 70%
2525 :align: center
2626
27+ ModelAdapter Lifecycle and Phase Transitions
28+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
29+
30+ The ModelAdapter goes through several distinct phases during its lifecycle. Understanding these phases helps users monitor and troubleshoot their LoRA deployments:
31+
32+ **Phase Transition Flow: **
33+
34+ ::
35+
36+ Pending → Scheduled → Loading → Bound → Running
37+ ↓ ↓ ↓ ↓ ↓
38+ Starting Pod Adapter Service Ready for
39+ reconcile Selected Loading Created Inference
40+
41+ **Phase Details: **
42+
43+ 1. **Pending **: Initial state when the ModelAdapter is first created. The controller starts reconciliation and validates the configuration.
44+
45+ 2. **Scheduled **: The controller has successfully identified and selected suitable pods that match the ``podSelector `` criteria. Pods are validated for readiness and stability before selection.
46+
47+ 3. **Loading **: The controller is actively loading the LoRA adapter onto the selected pods. This includes:
48+
49+ - Downloading the adapter from the specified ``artifactURL ``
50+ - Registering the adapter with the vLLM engine
51+ - Handling retry mechanisms with exponential backoff if loading fails
52+
53+ 4. **Bound **: The LoRA adapter has been successfully loaded onto the pods and the controller is creating the associated Kubernetes Service and EndpointSlice resources for service discovery.
54+
55+ 5. **Running **: The ModelAdapter is fully operational and ready to serve inference requests. The LoRA model is accessible through the gateway using the adapter name.
56+
57+ **Error Handling and Reliability Features: **
58+
59+ - **Retry Mechanism **: Up to 5 retry attempts per pod with exponential backoff (starting at 5 seconds)
60+ - **Pod Switching **: Automatic selection of alternative pods if loading fails on initial pods
61+ - **Pod Health Validation **: Ensures pods are ready and stable before scheduling adapters
62+ - **Connection Error Handling **: Graceful handling of common startup errors like "connection refused"
63+
64+ **Monitoring Phase Transitions: **
65+
66+ You can monitor the current phase and detailed status using:
67+
68+ .. code-block :: bash
69+
70+ kubectl describe modeladapter < adapter-name>
71+
72+ The status section will show the current phase and transition history with timestamps and reasons for each state change.
73+
2774Model Adapter Service Discovery
2875^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2976
@@ -81,21 +128,55 @@ Create lora model adapter
81128.. literalinclude :: ../../../samples/adapter/adapter.yaml
82129 :language: yaml
83130
84- If you run ```kubectl describe modeladapter qwen-code-lora ``, you will see the status of the lora adapter.
131+ If you run ```kubectl describe modeladapter qwen-code-lora ``, you will see the status of the lora adapter progressing through different phases:
132+
133+ **Phase 1: Pending **
85134
86135.. code-block :: bash
87136
88137 $ kubectl describe modeladapter qwen-code-lora
89- .....
90138 Status:
91139 Conditions:
92140 Last Transition Time: 2025-02-16T19:14:50Z
93141 Message: Starting reconciliation
94142 Reason: ModelAdapterPending
95143 Status: Unknown
96144 Type: Initialized
145+ Phase: Pending
146+
147+ **Phase 2: Scheduled **
148+
149+ .. code-block :: bash
150+
151+ $ kubectl describe modeladapter qwen-code-lora
152+ Status:
153+ Conditions:
154+ Last Transition Time: 2025-02-16T19:14:50Z
155+ Message: Starting reconciliation
156+ Reason: ModelAdapterPending
157+ Status: Unknown
158+ Type: Initialized
159+ Last Transition Time: 2025-02-16T19:14:52Z
160+ Message: ModelAdapter default/qwen-code-lora has selected 1 pods for scheduling: [qwen-coder-1-5b-instruct-5587f4c57d-kml6s]
161+ Reason: Scheduled
162+ Status: True
163+ Type: Scheduled
164+ Phase: Scheduled
165+
166+ **Phase 3-5: Loading → Bound → Running **
167+
168+ .. code-block :: bash
169+
170+ $ kubectl describe modeladapter qwen-code-lora
171+ Status:
172+ Conditions:
97173 Last Transition Time: 2025-02-16T19:14:50Z
98- Message: ModelAdapter default/qwen-code-lora has been allocated to pod default/qwen-coder-1-5b-instruct-5587f4c57d-kml6s
174+ Message: Starting reconciliation
175+ Reason: ModelAdapterPending
176+ Status: Unknown
177+ Type: Initialized
178+ Last Transition Time: 2025-02-16T19:14:52Z
179+ Message: ModelAdapter default/qwen-code-lora has selected 1 pods for scheduling: [qwen-coder-1-5b-instruct-5587f4c57d-kml6s]
99180 Reason: Scheduled
100181 Status: True
101182 Type: Scheduled
@@ -107,7 +188,6 @@ If you run ```kubectl describe modeladapter qwen-code-lora``, you will see the s
107188 Instances:
108189 qwen-coder-1-5b-instruct-5587f4c57d-kml6s
109190 Phase: Running
110- Events: < none>
111191
112192 Send request using lora model name to the gateway.
113193
@@ -142,6 +222,244 @@ This ensures that the LoRA adapter is correctly associated with the right pods.
142222 Note: this is only working with vLLM engine. If you use other engine, feel free to open an issue.
143223
144224
225+ Troubleshooting Phase Transitions
226+ ----------------------------------
227+
228+ If your ModelAdapter gets stuck in a particular phase, here are common issues and solutions:
229+
230+ **Stuck in Pending Phase: **
231+
232+ - Check if pods matching ``podSelector `` exist and are running
233+ - Verify that pods have the correct labels (``model.aibrix.ai/name `` and ``adapter.model.aibrix.ai/enabled=true ``)
234+ - Ensure pods are in Ready state
235+
236+ **Stuck in Scheduled Phase: **
237+
238+ - This was a known issue in earlier versions that has been fixed
239+ - Check controller logs for any loading errors: ``kubectl logs -n aibrix-system deployment/aibrix-controller-manager ``
240+ - Verify vLLM pods have ``VLLM_ALLOW_RUNTIME_LORA_UPDATING `` enabled
241+
242+ **Stuck in Loading Phase: **
243+
244+ - Check if the ``artifactURL `` is accessible and valid
245+ - For Hugging Face models, ensure the model path exists
246+ - Check authentication if using private repositories
247+ - Review controller logs for specific loading errors
248+ - The controller will retry up to 5 times with exponential backoff before switching to alternative pods
249+
250+ **Failed to Reach Running Phase: **
251+
252+ - Check if Kubernetes Service creation succeeded: ``kubectl get svc <adapter-name> ``
253+ - Verify EndpointSlice creation: ``kubectl get endpointslices ``
254+ - Check pod readiness and health status
255+
256+ **Useful Commands for Debugging: **
257+
258+ .. code-block :: bash
259+
260+ # Check ModelAdapter status
261+ kubectl describe modeladapter < adapter-name>
262+
263+ # Check controller logs
264+ kubectl logs -n aibrix-system deployment/aibrix-controller-manager
265+
266+ # Check pod logs
267+ kubectl logs < pod-name>
268+
269+ # Check services and endpoints
270+ kubectl get svc,endpointslices -l model.aibrix.ai/name=< adapter-name>
271+
272+ Reliability Features
273+ --------------------
274+
275+ AIBrix ModelAdapter includes several advanced reliability features to ensure robust LoRA adapter deployments in production environments:
276+
277+ Automatic Retry Mechanism
278+ ^^^^^^^^^^^^^^^^^^^^^^^^^^
279+
280+ The controller implements an intelligent retry system for adapter loading:
281+
282+ - **Up to 5 retry attempts ** per pod with exponential backoff
283+ - **Base interval of 5 seconds **, doubling on each retry (5s, 10s, 20s, 40s, 80s)
284+ - **Intelligent error detection ** for retriable vs. non-retriable errors
285+ - **Graceful handling ** of common startup errors like "connection refused"
286+
287+ .. code-block :: yaml
288+
289+ # Retry behavior is automatic - no configuration needed
290+ # Controller tracks retry state via annotations:
291+ # - adapter.model.aibrix.ai/retry-count.<pod-name>
292+ # - adapter.model.aibrix.ai/last-retry-time.<pod-name>
293+
294+ Pod Switching and Failover
295+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
296+
297+ When adapter loading fails on initial pods after maximum retries:
298+
299+ - **Automatic pod selection ** of alternative healthy pods
300+ - **Maintains desired replica count ** by switching to available pods
301+ - **Preserves scheduler preferences ** (least-adapters, default) for new selections
302+ - **Ensures eventual consistency ** - adapters will load on healthy pods
303+
304+ Pod Health Validation
305+ ^^^^^^^^^^^^^^^^^^^^^^
306+
307+ The controller implements comprehensive pod health checks:
308+
309+ - **Readiness validation **: Pods must be in Ready state before selection
310+ - **Stability requirements **: Pods must be stable (ready for 5+ seconds) to avoid flapping
311+ - **Continuous monitoring **: Unhealthy pods are automatically removed from instances list
312+ - **Startup tolerance **: Handles pod startup delays gracefully
313+
314+ Connection Error Handling
315+ ^^^^^^^^^^^^^^^^^^^^^^^^^^
316+
317+ Robust error handling for various network and startup conditions:
318+
319+ .. code-block :: text
320+
321+ Retriable Errors (will trigger retry):
322+ - "connection refused"
323+ - "connection reset"
324+ - "timeout"
325+ - "no route to host"
326+ - "network is unreachable"
327+ - "temporary failure"
328+ - "service unavailable"
329+ - "bad gateway"
330+
331+ Non-Retriable Errors (will fail immediately):
332+ - Authentication errors
333+ - Invalid model format
334+ - Configuration errors
335+
336+ Advanced Configuration Examples
337+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
338+
339+ **High-Availability Setup: **
340+
341+ Here's a production-ready configuration demonstrating reliability features:
342+
343+ .. literalinclude :: ../../../samples/adapter/adapter-reliability-demo.yaml
344+ :language: yaml
345+
346+ **Multi-Replica Distribution: **
347+
348+ For load balancing across multiple pods:
349+
350+ .. literalinclude :: ../../../samples/adapter/adapter-multi-replica.yaml
351+ :language: yaml
352+ :lines: 1-31
353+
354+ **Testing Commands: **
355+
356+ .. code-block :: bash
357+
358+ # 1. Apply the reliability demo adapter:
359+ kubectl apply -f samples/adapter/adapter-reliability-demo.yaml
360+
361+ # 2. Monitor phase transitions:
362+ watch kubectl describe modeladapter reliability-demo-lora
363+
364+ # 3. Check controller logs for retry behavior:
365+ kubectl logs -n aibrix-system deployment/aibrix-controller-manager -f
366+
367+ # 4. Test pod failure scenario (delete a pod to trigger switching):
368+ kubectl delete pod < pod-name>
369+ # Watch how controller handles pod loss and reschedules
370+
371+ # 5. Verify final state:
372+ kubectl get modeladapter reliability-demo-lora -o yaml
373+
374+ Monitoring and Observability
375+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
376+
377+ **Phase Monitoring: **
378+
379+ The reliability features are transparent through the standard ModelAdapter status:
380+
381+ .. code-block :: bash
382+
383+ # Monitor real-time phase transitions
384+ watch kubectl describe modeladapter < adapter-name>
385+
386+ # Check for retry annotations (during loading failures)
387+ kubectl get modeladapter < adapter-name> -o yaml | grep -A 5 annotations
388+
389+ **Controller Logs: **
390+
391+ Monitor retry behavior and pod switching in controller logs:
392+
393+ .. code-block :: bash
394+
395+ # View retry attempts and pod switching
396+ kubectl logs -n aibrix-system deployment/aibrix-controller-manager -f | grep -E " (retry|switching|pod)"
397+
398+ # Example log entries:
399+ # I0830 03:55:42.221143 1 modeladapter_controller.go:557] "Selected pods for adapter scheduling" ModelAdapter="default/my-lora" selectedPods=["pod-1"]
400+ # I0830 03:55:45.331456 1 modeladapter_controller.go:987] "Retriable error loading adapter" pod="pod-1" error="connection refused"
401+ # I0830 03:55:50.445789 1 modeladapter_controller.go:1120] "Successfully loaded adapter on pod" pod="pod-2" ModelAdapter="default/my-lora"
402+
403+ **Events and Conditions: **
404+
405+ Reliability events are recorded in the ModelAdapter status conditions:
406+
407+ .. code-block :: bash
408+
409+ kubectl describe modeladapter < adapter-name>
410+
411+ # Example events showing retry and recovery:
412+ Events:
413+ Type Reason Message
414+ ---- ------ -------
415+ Normal Scheduled ModelAdapter has selected 1 pods for scheduling: [pod-1]
416+ Warning MaxRetriesExceeded Max retries exceeded for pod pod-1: connection refused
417+ Normal Scheduled ModelAdapter has selected 1 pods for scheduling: [pod-2]
418+ Normal Loaded Successfully loaded adapter on pod pod-2
419+
420+ Best Practices for Production
421+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
422+
423+ 1. **Deploy Multiple Base Model Replicas: **
424+
425+ .. code-block :: yaml
426+
427+ # Ensure multiple pods are available for failover
428+ spec :
429+ replicas : 3 # At least 3 pods for reliability
430+
431+ 2. **Use Multiple Adapter Replicas: **
432+
433+ .. code-block :: yaml
434+
435+ # Distribute adapter across multiple pods
436+ spec :
437+ replicas : 2 # Load adapter on 2 different pods
438+
439+ 3. **Configure Appropriate Health Checks: **
440+
441+ .. code-block :: yaml
442+
443+ # In base model deployment
444+ livenessProbe :
445+ httpGet :
446+ path : /health
447+ port : 8000
448+ initialDelaySeconds : 60
449+ periodSeconds : 30
450+ failureThreshold : 3
451+
452+ 4. **Monitor Adapter Health: **
453+
454+ .. code-block :: bash
455+
456+ # Regular health monitoring
457+ kubectl get modeladapters -o wide
458+ kubectl describe modeladapter < name>
459+
460+ # Set up alerts on phase transitions
461+ kubectl get events --field-selector involvedObject.kind=ModelAdapter
462+
145463 More configurations
146464-------------------
147465
0 commit comments