From 4890d93ffe42c79466f09d59cf91298969b223a9 Mon Sep 17 00:00:00 2001 From: X1aoZEOuO Date: Tue, 23 Sep 2025 11:10:00 +0800 Subject: [PATCH] doc: add serverless doc with keda and activator. Signed-off-by: X1aoZEOuO --- site/content/en/docs/features/serverless.md | 140 ++++++++++++++++++++ 1 file changed, 140 insertions(+) create mode 100644 site/content/en/docs/features/serverless.md diff --git a/site/content/en/docs/features/serverless.md b/site/content/en/docs/features/serverless.md new file mode 100644 index 00000000..1eafacac --- /dev/null +++ b/site/content/en/docs/features/serverless.md @@ -0,0 +1,140 @@ +--- +title: Serverless +weight: 4 +--- + +## Overview + +This comprehensive guide provides enterprise-grade configuration patterns for serverless environments on Kubernetes, focusing on advanced integrations between Prometheus monitoring and KEDA autoscaling. The architecture delivers optimal resource efficiency through event-driven scaling while maintaining observability and resilience for AI/ML workloads and other latency-sensitive applications. + +## Concepts + +### Prometheus Configuration + +Prometheus is utilized for monitoring and alerting purposes. To enable cross-namespace ServiceMonitor discovery, configure the `namespaceSelector`. In Prometheus, define the `serviceMonitorSelector` to associate with ServiceMonitors. + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: qwen2-0--5b-lb-monitor + namespace: llmaz-system + labels: + control-plane: controller-manager + app.kubernetes.io/name: servicemonitor +spec: + namespaceSelector: + any: true + selector: + matchLabels: + llmaz.io/model-name: qwen2-0--5b + endpoints: + - port: http + path: /metrics + scheme: http +``` + +- Ensure the `namespaceSelector` is configured to allow cross-namespace monitoring. +- Appropriately label your services to be discovered by Prometheus. + +### KEDA Configuration + +KEDA (Kubernetes Event-driven Autoscaling) is employed for scaling applications based on custom metrics. It can be integrated with Prometheus to trigger scaling actions. + +```yaml +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: qwen2-0--5b-scaler + namespace: default +spec: + scaleTargetRef: + apiVersion: inference.llmaz.io/v1alpha1 + kind: Playground + name: qwen2-0--5b + pollingInterval: 30 + cooldownPeriod: 50 + minReplicaCount: 0 + maxReplicaCount: 3 + triggers: + - type: prometheus + metadata: + serverAddress: http://prometheus-operated.llmaz-system.svc.cluster.local:9090 + metricName: llamacpp:requests_processing + query: sum(llamacpp:requests_processing) + threshold: "0.2" +``` + +- Ensure the `serverAddress` correctly points to the Prometheus service. +- Adjust `pollingInterval` and `cooldownPeriod` to optimize scaling behavior and prevent conflicts with other scaling mechanisms. + +### Integration with Activator + +Consider integrating the serverless configuration with an activator for scale-from-zero scenarios. The activator can be implemented using a controller pattern or as a standalone goroutine. + +Key Architecture Components: +- Request Interception: Capture incoming requests to scaled-to-zero services +- Pre-Scale Trigger: Initiate scale-up before forwarding requests +- Request Buffering: Queue requests during cold start period +- Event-Driven Scaling: Integrate with KEDA using CloudEvents: + +### Controller Runtime Framework + +The Controller Runtime framework simplifies the development of Kubernetes controllers by providing abstractions for managing resources and handling events. + +#### Key Components + +1. **Controller**: Monitors resource states and triggers actions to align actual and desired states. +2. **Reconcile Function**: Contains the core logic for transitioning resource states. +3. **Manager**: Manages the lifecycle of controllers and shared resources. +4. **Client**: Interface for interacting with the Kubernetes API. +5. **Scheme**: Registry for resource types. +6. **Event Source and Handler**: Define event sources and handling logic. + +## Quick Start Guide + +1. Install Prometheus and KEDA using Helm charts, following the official documentation [Install Guide](https://llmaz.inftyai.com/docs/getting-started/installation/). + +```bash +helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10 +make install-keda +make install-prometheus +``` + +2. Create a ServiceMonitor for Prometheus to discover your services. + +```bash +kubectl apply -f service-monitor.yaml +``` + +3. Create a ScaledObject for KEDA to manage scaling. + +```bash +kubectl apply -f scaled-object.yaml +``` + +4. Test with a cold start application. + +```bash +kubectl exec -it -n kube-system deploy/activator -- wget -O- qwen2-0--5b-lb.default.svc:8080 +``` + +5. Use Prometheus and KEDA dashboards to monitor metrics and scaling activities via web pages. + +```bash +kubectl port-forward services/prometheus-operated 9090:9090 --address 0.0.0.0 -n llmaz-system +``` + +## Benchmark + +Cold start latency is a critical metric for evaluating user experience in llmaz Serverless environments. To assess performance stability and efficiency, we conducted rigorous testing under different instance scaling scenarios. The testing methodology included: + +| Scaling Pattern | Avg. Latency (s) | P90 Latency (s) | Resource Initialization | Optimization Potential | +|-----------------|------------------|-----------------|-------------------------|-------------------------| +| **0 -> 1** | 29 | 31 | Full pod creation
Image pull
Engine initialization | Pre-fetching
Snapshot restore | +| **1 -> 2** | 15 | 16 | Partial image reuse
Network reuse
Pod creation | Warm pool
Priority scheduling | +| **2 -> 3** | 11 | 12 | Cached dependencies
Parallel scheduling
Shared resources | Predictive scaling
Node affinity | + +## Conclusion + +This configuration guide offers a detailed approach to setting up a serverless environment with Kubernetes, Prometheus, and KEDA. By adhering to these guidelines, you can ensure efficient scaling and monitoring of your applications.