Skip to content

Commit 089667b

Browse files
committed
workflows: Add vLLM workflow for LLM serving
Add vLLM (Very Large Language Model) inference server workflow to kdevops, enabling deployment and testing of large language models on both CPU and GPU infrastructure. The workflow provides three deployment methods to match different use cases: Docker for simple deployments, Kubernetes with Helm for production environments, and bare metal with systemd for direct hardware access. Each method shares common configuration through Kconfig while maintaining deployment-specific optimizations. CPU inference requires openeuler/vllm-cpu:latest as upstream vLLM versions have broken CPU support (NotImplementedError in v0.6.5+, device detection failures in v0.10.x). The production stack needs 16GB RAM minimum due to Kubernetes overhead, while simple Docker deployments work with 8GB for small models like facebook/opt-125m. The implementation integrates with existing kdevops patterns including A/B testing support, data partition management for model storage, and the standard workflow makefile targets for deployment and testing. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <[email protected]>
1 parent eabf034 commit 089667b

35 files changed

+4310
-2
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ playbooks/roles/linux-mirror/linux-mirror-systemd/mirrors.yaml
9191
workflows/selftests/results/
9292

9393
workflows/minio/results/
94+
workflows/vllm/results/
9495

9596
workflows/linux/refs/default/Kconfig.linus
9697
workflows/linux/refs/default/Kconfig.next

PROMPTS.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,37 @@ and example commits and their outcomes, and notes by users of the AI agent
55
grading. It is also instructive for humans to learn how to use generative
66
AI to easily extend kdevops for their own needs.
77

8+
## Adding new AI/ML workflows
9+
10+
### Adding vLLM Production Stack workflow
11+
12+
**Prompt:**
13+
I have placed in ../production-stack/ the https://github.com/vllm-project/production-stack.git
14+
project. Familiarize yourself with it and then add support for as a new
15+
I workflow, other than Milvus AI on kdevops.
16+
17+
**AI:** Claude Code
18+
**Commit:** TBD
19+
**Result:** Tough
20+
**Grading:** 50%
21+
22+
**Notes:**
23+
24+
Adding just vllm was fairly trivial. However the production stack project
25+
lacked any clear documentation about what docker container image could be
26+
used for CPU support, and all docker container images had one or another
27+
obscure issue.
28+
29+
So while getting the vllm and the production stack generally supported was
30+
faily trivial, the lack of proper docs make it hard to figure out exactly what
31+
to do.
32+
33+
Fortunately the implementation correctly identified the need for Kubernetes
34+
orchestration, included support for various deployment options (Minikube vs
35+
existing clusters), and integrated monitoring with Prometheus/Grafana. The
36+
workflow supports A/B testing, multiple routing algorithms, and performance
37+
benchmarking capabilities.
38+
839
## Extending existing Linux kernel selftests
940

1041
Below are a set of example prompts / result commits of extending existing

README.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -285,10 +285,30 @@ For detailed documentation and demo results, see the
285285

286286
### AI workflow
287287

288-
kdevops now supports AI/ML system benchmarking, starting with vector databases
289-
like Milvus. Similar to fstests, you can quickly set up and benchmark AI
288+
kdevops now supports AI/ML system benchmarking, including vector databases
289+
and LLM serving infrastructure. Similar to fstests, you can quickly set up and benchmark AI
290290
infrastructure with just a few commands:
291291

292+
#### vLLM Production Stack
293+
Deploy and benchmark large language models using the vLLM Production Stack:
294+
295+
```bash
296+
make defconfig-vllm
297+
make bringup
298+
make vllm
299+
make vllm-benchmark
300+
```
301+
302+
The vLLM workflow provides:
303+
- **Production LLM Deployment**: Kubernetes-based vLLM serving with Helm
304+
- **Request Routing**: Multiple algorithms (round-robin, session affinity, prefix-aware)
305+
- **Observability**: Integrated Prometheus and Grafana monitoring
306+
- **Performance Features**: Prefix caching, chunked prefill, KV cache offloading
307+
- **A/B Testing**: Compare different model configurations
308+
309+
#### Milvus Vector Database
310+
Benchmark vector database performance for AI applications:
311+
292312
```bash
293313
make defconfig-ai-milvus-docker
294314
make bringup
@@ -303,6 +323,7 @@ The AI workflow supports:
303323
- **Demo Results**: View actual benchmark HTML reports and performance visualizations
304324

305325
For details and demo results, see:
326+
- [kdevops vLLM workflow documentation](workflows/vllm/)
306327
- [kdevops AI workflow documentation](docs/ai/README.md)
307328
- [Milvus performance demo results](docs/ai/vector-databases/milvus.md#demo-results)
308329

@@ -358,6 +379,7 @@ want to just use the kernel that comes with your Linux distribution.
358379
* [kdevops selftests docs](docs/selftests.md)
359380
* [kdevops reboot-limit docs](docs/reboot-limit.md)
360381
* [kdevops AI workflow docs](docs/ai/README.md)
382+
* [kdevops vLLM workflow docs](workflows/vllm/)
361383

362384
# kdevops general documentation
363385

defconfigs/vllm

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# vLLM configuration with Latest Docker deployment
2+
CONFIG_KDEVOPS_FIRST_RUN=n
3+
CONFIG_LIBVIRT=y
4+
CONFIG_LIBVIRT_URI="qemu:///system"
5+
CONFIG_LIBVIRT_HOST_PASSTHROUGH=y
6+
CONFIG_LIBVIRT_MACHINE_TYPE_DEFAULT=y
7+
CONFIG_LIBVIRT_CPU_MODEL_PASSTHROUGH=y
8+
CONFIG_LIBVIRT_VCPUS=8
9+
CONFIG_LIBVIRT_RAM=32768
10+
CONFIG_LIBVIRT_OS_VARIANT="generic"
11+
CONFIG_LIBVIRT_STORAGE_POOL_PATH_CUSTOM=n
12+
CONFIG_LIBVIRT_STORAGE_POOL_CREATE=y
13+
CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
14+
# vLLM requires substantial storage for:
15+
# - Kubernetes/Minikube installation (~20GB)
16+
# - Docker images for vLLM stack (~30GB)
17+
# - Model weights storage (varies by model, 10-100GB+)
18+
# - Benchmark results and logs (~10GB)
19+
# - Container runtime data and caches (~40GB)
20+
CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="250"
21+
22+
# Network configuration
23+
CONFIG_KDEVOPS_NETWORK_TYPE_NATUAL_BRIDGE=y
24+
25+
# Workflow configuration
26+
CONFIG_WORKFLOWS=y
27+
CONFIG_WORKFLOWS_TESTS=y
28+
CONFIG_WORKFLOWS_LINUX_TESTS=y
29+
CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
30+
CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
31+
32+
# vLLM specific configuration
33+
CONFIG_VLLM_LATEST_DOCKER=y
34+
CONFIG_VLLM_K8S_MINIKUBE=y
35+
CONFIG_VLLM_HELM_RELEASE_NAME="vllm"
36+
CONFIG_VLLM_HELM_NAMESPACE="vllm-system"
37+
CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
38+
CONFIG_VLLM_MODEL_NAME="opt-125m"
39+
CONFIG_VLLM_REPLICA_COUNT=1
40+
CONFIG_VLLM_USE_CPU_INFERENCE=y
41+
CONFIG_VLLM_REQUEST_CPU=8
42+
CONFIG_VLLM_REQUEST_MEMORY="32Gi"
43+
CONFIG_VLLM_REQUEST_GPU=0
44+
CONFIG_VLLM_MAX_MODEL_LEN=2048
45+
CONFIG_VLLM_DTYPE="float32"
46+
CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
47+
CONFIG_VLLM_ROUTER_ENABLED=y
48+
CONFIG_VLLM_ROUTER_ROUND_ROBIN=y
49+
CONFIG_VLLM_OBSERVABILITY_ENABLED=y
50+
CONFIG_VLLM_GRAFANA_PORT=3000
51+
CONFIG_VLLM_PROMETHEUS_PORT=9090
52+
CONFIG_VLLM_API_PORT=8000
53+
CONFIG_VLLM_API_KEY=""
54+
CONFIG_VLLM_HF_TOKEN=""
55+
CONFIG_VLLM_BENCHMARK_ENABLED=y
56+
CONFIG_VLLM_BENCHMARK_DURATION=60
57+
CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10
58+
CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"
59+
60+
# A/B testing support
61+
CONFIG_KDEVOPS_BASELINE_AND_DEV=y
62+
CONFIG_WORKFLOW_LINUX_CUSTOM=y
63+
CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y

defconfigs/vllm-production-stack

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# vLLM Production Stack configuration with official Helm chart
2+
CONFIG_KDEVOPS_FIRST_RUN=n
3+
CONFIG_LIBVIRT=y
4+
CONFIG_LIBVIRT_URI="qemu:///system"
5+
CONFIG_LIBVIRT_HOST_PASSTHROUGH=y
6+
CONFIG_LIBVIRT_MACHINE_TYPE_DEFAULT=y
7+
CONFIG_LIBVIRT_CPU_MODEL_PASSTHROUGH=y
8+
CONFIG_LIBVIRT_VCPUS=64
9+
CONFIG_LIBVIRT_MEM_64G=y
10+
CONFIG_LIBVIRT_OS_VARIANT="generic"
11+
CONFIG_LIBVIRT_IMAGE_SIZE="100G"
12+
CONFIG_LIBVIRT_LVM_GROUP=""
13+
CONFIG_TARGET_ARCH_X86_64=y
14+
CONFIG_KDEVOPS_LOCAL_QCOW2_DEVELOPMENT=y
15+
CONFIG_KDEVOPS_SETUP_WORKFLOWS=y
16+
17+
# Target kernel configuration
18+
CONFIG_TARGET_LINUX_UPSTREAM=y
19+
CONFIG_TARGET_LINUX_UPSTREAM_LINUS=y
20+
CONFIG_TARGET_LINUX_VERSION="linus"
21+
22+
# Network configuration
23+
CONFIG_GUESTFS_DEFAULT_BRIDGE=y
24+
CONFIG_GUESTFS_NETWORK="default"
25+
CONFIG_KDEVOPS_HOSTS="hosts"
26+
CONFIG_KDEVOPS_NODES="nodes"
27+
CONFIG_SSH_NO_VERIFY_KNOWNHOSTS=y
28+
29+
# Workflow configuration
30+
CONFIG_WORKFLOWS=y
31+
CONFIG_WORKFLOWS_TESTS=y
32+
CONFIG_WORKFLOWS_LINUX_TESTS=y
33+
CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
34+
CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
35+
36+
# vLLM Production Stack specific configuration
37+
CONFIG_VLLM_PRODUCTION_STACK=y
38+
CONFIG_VLLM_K8S_MINIKUBE=y
39+
CONFIG_VLLM_VERSION_LATEST=y
40+
CONFIG_VLLM_HELM_RELEASE_NAME="vllm-prod"
41+
CONFIG_VLLM_HELM_NAMESPACE="vllm-system"
42+
CONFIG_VLLM_PROD_STACK_REPO="https://vllm-project.github.io/production-stack"
43+
CONFIG_VLLM_PROD_STACK_CHART_VERSION="latest"
44+
CONFIG_VLLM_PROD_STACK_ROUTER_IMAGE="ghcr.io/vllm-project/production-stack/router"
45+
CONFIG_VLLM_PROD_STACK_ROUTER_TAG="latest"
46+
CONFIG_VLLM_PROD_STACK_ENABLE_MONITORING=y
47+
CONFIG_VLLM_PROD_STACK_ENABLE_AUTOSCALING=n
48+
CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
49+
CONFIG_VLLM_MODEL_NAME="opt-125m"
50+
CONFIG_VLLM_REPLICA_COUNT=2
51+
CONFIG_VLLM_USE_CPU_INFERENCE=y
52+
CONFIG_VLLM_REQUEST_CPU=8
53+
CONFIG_VLLM_REQUEST_MEMORY="20Gi"
54+
CONFIG_VLLM_REQUEST_GPU=0
55+
CONFIG_VLLM_MAX_MODEL_LEN=2048
56+
CONFIG_VLLM_DTYPE="float32"
57+
CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
58+
CONFIG_VLLM_ROUTER_ENABLED=y
59+
CONFIG_VLLM_ROUTER_ROUND_ROBIN=y
60+
CONFIG_VLLM_OBSERVABILITY_ENABLED=y
61+
CONFIG_VLLM_GRAFANA_PORT=3000
62+
CONFIG_VLLM_PROMETHEUS_PORT=9090
63+
CONFIG_VLLM_API_PORT=8000
64+
CONFIG_VLLM_BENCHMARK_ENABLED=y
65+
CONFIG_VLLM_BENCHMARK_DURATION=60
66+
CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10
67+
CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"

defconfigs/vllm-quick-test

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# vLLM Production Stack quick test configuration (CI/demo)
2+
CONFIG_KDEVOPS_FIRST_RUN=n
3+
CONFIG_LIBVIRT=y
4+
CONFIG_LIBVIRT_URI="qemu:///system"
5+
CONFIG_LIBVIRT_HOST_PASSTHROUGH=y
6+
CONFIG_LIBVIRT_MACHINE_TYPE_DEFAULT=y
7+
CONFIG_LIBVIRT_CPU_MODEL_PASSTHROUGH=y
8+
CONFIG_LIBVIRT_VCPUS=4
9+
CONFIG_LIBVIRT_RAM=16384
10+
CONFIG_LIBVIRT_OS_VARIANT="generic"
11+
CONFIG_LIBVIRT_STORAGE_POOL_PATH_CUSTOM=n
12+
CONFIG_LIBVIRT_STORAGE_POOL_CREATE=y
13+
CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
14+
# Quick test uses smaller storage (100GB) as it:
15+
# - Uses lightweight opt-125m model (~500MB)
16+
# - Runs shorter benchmarks with less data
17+
# - Still needs space for Kubernetes/Docker infrastructure
18+
CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="100"
19+
20+
# Network configuration
21+
CONFIG_KDEVOPS_NETWORK_TYPE_NATUAL_BRIDGE=y
22+
23+
# Workflow configuration
24+
CONFIG_WORKFLOWS=y
25+
CONFIG_WORKFLOWS_TESTS=y
26+
CONFIG_WORKFLOWS_LINUX_TESTS=y
27+
CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
28+
CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
29+
30+
# vLLM specific configuration - Quick test mode
31+
CONFIG_VLLM_PRODUCTION_STACK=y
32+
CONFIG_VLLM_K8S_MINIKUBE=y
33+
CONFIG_VLLM_HELM_RELEASE_NAME="vllm"
34+
CONFIG_VLLM_HELM_NAMESPACE="vllm-system"
35+
CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
36+
CONFIG_VLLM_MODEL_NAME="opt-125m"
37+
CONFIG_VLLM_REPLICA_COUNT=1
38+
CONFIG_VLLM_REQUEST_CPU=2
39+
CONFIG_VLLM_REQUEST_MEMORY="8Gi"
40+
CONFIG_VLLM_REQUEST_GPU=0
41+
CONFIG_VLLM_GPU_TYPE=""
42+
CONFIG_VLLM_MAX_MODEL_LEN=512
43+
CONFIG_VLLM_DTYPE="auto"
44+
CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.9"
45+
CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
46+
CONFIG_VLLM_ROUTER_ENABLED=y
47+
CONFIG_VLLM_ROUTER_ROUND_ROBIN=y
48+
CONFIG_VLLM_OBSERVABILITY_ENABLED=y
49+
CONFIG_VLLM_GRAFANA_PORT=3000
50+
CONFIG_VLLM_PROMETHEUS_PORT=9090
51+
CONFIG_VLLM_API_PORT=8000
52+
CONFIG_VLLM_API_KEY=""
53+
CONFIG_VLLM_HF_TOKEN=""
54+
CONFIG_VLLM_QUICK_TEST=y
55+
CONFIG_VLLM_BENCHMARK_ENABLED=y
56+
CONFIG_VLLM_BENCHMARK_DURATION=30
57+
CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=5
58+
CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"
59+
60+
# A/B testing support
61+
CONFIG_KDEVOPS_BASELINE_AND_DEV=y
62+
CONFIG_WORKFLOW_LINUX_CUSTOM=y
63+
CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y

kconfigs/Kconfig.libvirt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -335,6 +335,7 @@ config LIBVIRT_LARGE_CPU
335335

336336
choice
337337
prompt "Guest vCPUs"
338+
default LIBVIRT_VCPUS_64 if KDEVOPS_WORKFLOW_DEDICATE_VLLM
338339
default LIBVIRT_VCPUS_8
339340

340341
config LIBVIRT_VCPUS_2
@@ -408,6 +409,7 @@ config LIBVIRT_VCPUS_COUNT
408409

409410
choice
410411
prompt "How much GiB memory to use per guest"
412+
default LIBVIRT_MEM_64G if KDEVOPS_WORKFLOW_DEDICATE_VLLM
411413
default LIBVIRT_MEM_4G
412414

413415
config LIBVIRT_MEM_2G
@@ -478,6 +480,7 @@ config LIBVIRT_MEM_MB
478480
config LIBVIRT_IMAGE_SIZE
479481
string "VM image size"
480482
output yaml
483+
default "100G" if KDEVOPS_WORKFLOW_DEDICATE_VLLM
481484
default "20G"
482485
depends on GUESTFS
483486
help

kconfigs/workflows/Kconfig

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,14 @@ config KDEVOPS_WORKFLOW_DEDICATE_AI
233233
This will dedicate your configuration to running only the
234234
AI workflow for vector database performance testing.
235235

236+
config KDEVOPS_WORKFLOW_DEDICATE_VLLM
237+
bool "vllm"
238+
select KDEVOPS_WORKFLOW_ENABLE_VLLM
239+
help
240+
This will dedicate your configuration to running only the
241+
vLLM Production Stack workflow for deploying and benchmarking
242+
large language models with Kubernetes.
243+
236244
config KDEVOPS_WORKFLOW_DEDICATE_MINIO
237245
bool "minio"
238246
select KDEVOPS_WORKFLOW_ENABLE_MINIO
@@ -265,6 +273,7 @@ config KDEVOPS_WORKFLOW_NAME
265273
default "mmtests" if KDEVOPS_WORKFLOW_DEDICATE_MMTESTS
266274
default "fio-tests" if KDEVOPS_WORKFLOW_DEDICATE_FIO_TESTS
267275
default "ai" if KDEVOPS_WORKFLOW_DEDICATE_AI
276+
default "vllm" if KDEVOPS_WORKFLOW_DEDICATE_VLLM
268277
default "minio" if KDEVOPS_WORKFLOW_DEDICATE_MINIO
269278
default "build-linux" if KDEVOPS_WORKFLOW_DEDICATE_BUILD_LINUX
270279

@@ -395,6 +404,14 @@ config KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_AI
395404
Select this option if you want to provision AI benchmarks on a
396405
single target node for by-hand testing.
397406

407+
config KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_VLLM
408+
bool "vllm"
409+
select KDEVOPS_WORKFLOW_ENABLE_VLLM
410+
depends on LIBVIRT || TERRAFORM_PRIVATE_NET
411+
help
412+
Select this option if you want to provision vLLM Production Stack
413+
on a single target node for by-hand testing and development.
414+
398415
endif # !WORKFLOWS_DEDICATED_WORKFLOW
399416

400417
config KDEVOPS_WORKFLOW_ENABLE_FSTESTS
@@ -530,6 +547,17 @@ source "workflows/ai/Kconfig"
530547
endmenu
531548
endif # KDEVOPS_WORKFLOW_ENABLE_AI
532549

550+
config KDEVOPS_WORKFLOW_ENABLE_VLLM
551+
bool
552+
output yaml
553+
default y if KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_VLLM || KDEVOPS_WORKFLOW_DEDICATE_VLLM
554+
555+
if KDEVOPS_WORKFLOW_ENABLE_VLLM
556+
menu "Configure and run vLLM Production Stack"
557+
source "workflows/vllm/Kconfig"
558+
endmenu
559+
endif # KDEVOPS_WORKFLOW_ENABLE_VLLM
560+
533561
config KDEVOPS_WORKFLOW_ENABLE_MINIO
534562
bool
535563
output yaml

playbooks/roles/gen_hosts/tasks/main.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,21 @@
270270
- ansible_hosts_template.stat.exists
271271
- not kdevops_use_declared_hosts|default(false)|bool
272272

273+
- name: Generate the Ansible hosts file for a dedicated vLLM setup
274+
tags: ['hosts']
275+
ansible.builtin.template:
276+
src: "{{ kdevops_hosts_template }}"
277+
dest: "{{ ansible_cfg_inventory }}"
278+
force: true
279+
trim_blocks: True
280+
lstrip_blocks: True
281+
mode: '0644'
282+
when:
283+
- kdevops_workflows_dedicated_workflow
284+
- kdevops_workflow_enable_vllm|default(false)|bool
285+
- ansible_hosts_template.stat.exists
286+
- not kdevops_use_declared_hosts|default(false)|bool
287+
273288
- name: Verify if final host file exists
274289
ansible.builtin.stat:
275290
path: "{{ ansible_cfg_inventory }}"

0 commit comments

Comments
 (0)