Skip to content

Commit 7bd0ab8

Browse files
committed
docs: Update vLLM installation documentation and improve CPU configuration
1 parent 721accc commit 7bd0ab8

File tree

2 files changed

+83
-26
lines changed

2 files changed

+83
-26
lines changed

benchmark/llm_bench/README.md

Lines changed: 78 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ LLM benchmark module for heimdall. This module allows you to measure and compare
66

77
- **PyTorch**: CPU inference using Meta's official Llama3 implementation
88
- **Llama.cpp**: Efficient CPU inference with quantized models
9-
- **vLLM**: High-performance inference serving for both CPU and GPU
9+
- **vLLM**: High-performance inference serving for both CPU and GPU (v0.9.1)
1010

1111
## Prerequisites
1212

@@ -20,13 +20,19 @@ LLM benchmark module for heimdall. This module allows you to measure and compare
2020
# Ubuntu/Debian
2121
sudo apt-get update
2222
sudo apt-get install -y numactl git-lfs
23+
24+
# For vLLM CPU (additional requirements)
25+
sudo apt-get install -y gcc-12 g++-12 libnuma-dev python3-dev
26+
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
2327
```
2428

29+
30+
2531
## Quick Start
2632

2733
### 1. Installation
2834

29-
Install each framework separately:
35+
Install each framework separately (automatically manages virtual environments):
3036

3137
```bash
3238
# Install PyTorch (includes Llama3-8B model)
@@ -35,7 +41,7 @@ uv run heimdall bench install llm pytorch
3541
# Install Llama.cpp (includes quantized model)
3642
uv run heimdall bench install llm llamacpp
3743

38-
# Install vLLM CPU
44+
# Install vLLM CPU (builds from source with v0.9.1)
3945
uv run heimdall bench install llm vllm_cpu
4046

4147
# Install vLLM GPU
@@ -64,6 +70,24 @@ uv run heimdall bench run llm all
6470
uv run heimdall bench plot llm all
6571
```
6672

73+
## Virtual Environment Management
74+
75+
The benchmark system automatically manages Python virtual environments:
76+
77+
- **Automatic Creation**: Creates `.venv` in the project root if it doesn't exist
78+
- **Automatic Activation**: All commands run within the virtual environment
79+
- **Python 3.12**: Uses Python 3.12 with `uv` for optimal performance
80+
- **Isolation**: Each framework installation is isolated and doesn't conflict
81+
82+
Manual virtual environment usage:
83+
```bash
84+
# Activate virtual environment manually
85+
source .venv/bin/activate
86+
87+
# Check virtual environment status
88+
which python
89+
```
90+
6791
## Detailed Usage
6892

6993
### PyTorch
@@ -98,7 +122,7 @@ Features:
98122

99123
### vLLM
100124

101-
#### CPU Mode
125+
#### CPU Mode (v0.9.1)
102126
```bash
103127
# CPU-based inference benchmark
104128
bash benchmark/llm_bench/scripts/vllm_run_test.sh
@@ -107,6 +131,8 @@ bash benchmark/llm_bench/scripts/vllm_run_test.sh
107131
Environment variables:
108132
- `VLLM_CPU_KVCACHE_SPACE=30`: KV cache space configuration
109133
- `LD_PRELOAD`: Memory optimization using tcmalloc
134+
- `VLLM_TARGET_DEVICE=cpu`: Force CPU-only mode
135+
- `CUDA_VISIBLE_DEVICES=""`: Disable CUDA
110136

111137
#### GPU Mode
112138
```bash
@@ -118,25 +144,19 @@ Features:
118144
- Support for large models (70B)
119145
- Memory efficiency through CPU offloading
120146

121-
## vLLM Independent Installation
147+
## vLLM Installation Details
122148

123-
vLLM installation through heimdall may fail depending on individual environments. In such cases, you can install it independently:
149+
### CPU Mode (Source Build - v0.9.1)
124150

125-
### CPU Mode
126-
```bash
127-
pip install vllm[cpu]
128-
# Or build from source
129-
git clone https://github.com/vllm-project/vllm.git
130-
cd vllm
131-
pip install -e .
132-
```
151+
vLLM CPU installation automatically builds from source with the following process:
152+
153+
### Independent Installation (Alternative)
154+
155+
If heimdall installation fails, you can install independently:
156+
157+
> 🔗 **Reference**: For the latest installation methods, check the [Official vLLM CPU Installation Documentation](https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html).
133158
134-
### GPU Mode
135-
```bash
136-
pip install vllm
137-
```
138159

139-
After independent installation, you can run benchmarks by executing the scripts directly.
140160

141161
## NUMA Configuration
142162

@@ -198,7 +218,7 @@ Used datasets:
198218
- **Format**: 4-bit quantized GGUF
199219

200220
### vLLM
201-
- **CPU**: Meta-Llama-3-8B
221+
- **CPU**: Meta-Llama-3-8B (v0.9.1)
202222
- **GPU**: Meta-Llama-3-70B
203223
- **Source**: Automatic download from Hugging Face Hub
204224

@@ -212,7 +232,7 @@ Used datasets:
212232
```
213233

214234
2. **Memory Shortage**
215-
- Ensure sufficient RAM for model size
235+
- Ensure sufficient RAM for model size (32GB+ for 8B models)
216236
- Sufficient VRAM required for vLLM GPU mode
217237

218238
3. **Check NUMA Configuration**
@@ -223,12 +243,35 @@ Used datasets:
223243
4. **Permission Issues**
224244
- Permission settings required for perf commands
225245

226-
### vLLM Special Configuration
246+
5. **uv Command Not Found**
247+
```bash
248+
# uv should be available as part of heimdall setup
249+
# If missing, check heimdall installation
250+
export PATH="$HOME/.local/bin:$PATH"
251+
```
252+
253+
### vLLM Specific Issues
254+
255+
#### CPU Mode Build Errors
256+
```bash
257+
# Compiler issues
258+
sudo apt-get install -y gcc-12 g++-12 libnuma-dev python3-dev
259+
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
260+
261+
# CMake version issues
262+
pip install "cmake>=3.26.1"
263+
264+
# Memory issues during build
265+
export MAX_JOBS=4 # Limit parallel build jobs
266+
```
227267

228-
For optimal performance in CPU mode, set the following environment:
268+
#### Environment Configuration
269+
For optimal performance in CPU mode:
229270
```bash
230271
export VLLM_CPU_KVCACHE_SPACE=30
231272
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
273+
export VLLM_TARGET_DEVICE=cpu
274+
export CUDA_VISIBLE_DEVICES=""
232275
```
233276

234277
## Additional Scripts
@@ -243,4 +286,15 @@ export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
243286
# Direct PyTorch test execution
244287
cd benchmark/llm_bench/llama
245288
python pytorch_run_test.py --cpu_bind 0 --mem_bind 0 --description "Local DIMM"
246-
```
289+
290+
# Manual virtual environment usage
291+
source .venv/bin/activate
292+
python benchmark/llm_bench/src/vllm_run_test.py
293+
```
294+
295+
## Version Information
296+
297+
- **vLLM**: v0.9.1 (CPU source build)
298+
- **PyTorch**: 2.7.0+cpu (for vLLM CPU)
299+
- **Python**: 3.12 (recommended)
300+
- **uv**: Latest (for package management)

benchmark/llm_bench/scripts/vllm_run_test.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
# Set environment variables for CPU KV cache space and LD_PRELOAD
88
export VLLM_CPU_KVCACHE_SPACE=30
99
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4:$LD_PRELOAD
10+
export CUDA_VISIBLE_DEVICES=""
11+
export VLLM_DEVICE=cpu
1012

1113
HUGGING_FACE_HUB_TOKEN=$(cat ~/.cache/huggingface/token)
1214
export HUGGING_FACE_HUB_TOKEN
@@ -26,7 +28,7 @@ descriptions=(
2628
"CPU 1, Node 2 (Remote CXL)"
2729
)
2830

29-
VLLM_PATH="benchmark/llm_bench/vllm"
31+
VLLM_PATH="benchmark/llm_bench/vllm_cpu"
3032
MODEL="meta-llama/Meta-Llama-3-8B"
3133
DATASET="benchmark/llm_bench/datasets/ShareGPT_V3_unfiltered_cleaned_split.json"
3234
LOG_DIR="benchmark/llm_bench/logs/vllm"
@@ -68,7 +70,8 @@ vllm() {
6870
"${numa_cmd[@]}" python "$VLLM_PATH/benchmarks/benchmark_throughput.py" \
6971
--model "$MODEL" \
7072
--dataset "$DATASET" \
71-
--output-json "$output_json"
73+
--output-json "$output_json" \
74+
--device cpu
7275

7376
sleep 2
7477
}

0 commit comments

Comments
 (0)