|
| 1 | +# llama.cpp Development Container |
| 2 | + |
| 3 | +This dev container provides a complete Ubuntu 24.04 environment for building and testing llama.cpp with NUMA support and optional GPU acceleration. |
| 4 | + |
| 5 | +## Quick Start |
| 6 | + |
| 7 | +1. Open the project in VS Code |
| 8 | +2. When prompted, click "Reopen in Container" or use `Ctrl+Shift+P` → "Dev Containers: Reopen in Container" |
| 9 | +3. The container will build with the basic development tools (no GPU support by default) |
| 10 | + |
| 11 | +## Optional Components |
| 12 | + |
| 13 | +By default, the container includes only the essential build tools. You can enable additional components by editing `.devcontainer/devcontainer.json`: |
| 14 | + |
| 15 | +### CUDA Support (NVIDIA GPUs) |
| 16 | +```json |
| 17 | +"INSTALL_CUDA": "true" |
| 18 | +``` |
| 19 | +Installs CUDA toolkit for NVIDIA GPU acceleration. |
| 20 | + |
| 21 | +### ROCm Support (AMD GPUs) |
| 22 | +```json |
| 23 | +"INSTALL_ROCM": "true" |
| 24 | +``` |
| 25 | +Installs ROCm 6.4 for AMD GPU acceleration. |
| 26 | + |
| 27 | +### Python Dependencies |
| 28 | +```json |
| 29 | +"INSTALL_PYTHON_DEPS": "true" |
| 30 | +``` |
| 31 | +Installs Python packages for model conversion tools: |
| 32 | +- numpy, torch, transformers, sentencepiece, protobuf, gguf |
| 33 | + |
| 34 | +## Example Configurations |
| 35 | + |
| 36 | +### Full GPU Development (NVIDIA + Python) |
| 37 | +```json |
| 38 | +"build": { |
| 39 | + "args": { |
| 40 | + "INSTALL_CUDA": "true", |
| 41 | + "INSTALL_ROCM": "false", |
| 42 | + "INSTALL_PYTHON_DEPS": "true" |
| 43 | + } |
| 44 | +} |
| 45 | +``` |
| 46 | + |
| 47 | +### AMD GPU Development |
| 48 | +```json |
| 49 | +"build": { |
| 50 | + "args": { |
| 51 | + "INSTALL_CUDA": "false", |
| 52 | + "INSTALL_ROCM": "true", |
| 53 | + "INSTALL_PYTHON_DEPS": "true" |
| 54 | + } |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +### CPU-only with Python tools |
| 59 | +```json |
| 60 | +"build": { |
| 61 | + "args": { |
| 62 | + "INSTALL_CUDA": "false", |
| 63 | + "INSTALL_ROCM": "false", |
| 64 | + "INSTALL_PYTHON_DEPS": "true" |
| 65 | + } |
| 66 | +} |
| 67 | +``` |
| 68 | + |
| 69 | +## Making Changes |
| 70 | + |
| 71 | +### Method 1: Interactive Configuration Script (Recommended) |
| 72 | +```bash |
| 73 | +# Run the configuration helper |
| 74 | +chmod +x .devcontainer/configure.sh |
| 75 | +./.devcontainer/configure.sh |
| 76 | +``` |
| 77 | + |
| 78 | +### Method 2: Manual Configuration |
| 79 | +1. Edit `.devcontainer/devcontainer.json` |
| 80 | +2. Set the desired components to `"true"` or `"false"` |
| 81 | +3. Rebuild the container: `Ctrl+Shift+P` → "Dev Containers: Rebuild Container" |
| 82 | + |
| 83 | +## Features |
| 84 | + |
| 85 | +- **Ubuntu 24.04 LTS** base image |
| 86 | +- **Complete build toolchain**: gcc, cmake, ninja, ccache |
| 87 | +- **NUMA support**: libnuma-dev, numactl, hwloc for CPU topology detection |
| 88 | +- **Optional GPU acceleration**: CUDA 12.9 and/or ROCm 6.4 support |
| 89 | +- **Optional Python environment**: with packages for GGUF conversion tools |
| 90 | +- **VS Code integration**: with C/C++, CMake, and Python extensions |
| 91 | +- **Development tools**: gdb, valgrind for debugging |
| 92 | + |
| 93 | +## Quick Start |
| 94 | + |
| 95 | +1. **Open in VS Code**: Make sure you have the "Dev Containers" extension installed, then: |
| 96 | + - Open the llama.cpp folder in VS Code |
| 97 | + - Press `Ctrl+Shift+P` (or `Cmd+Shift+P` on Mac) |
| 98 | + - Type "Dev Containers: Reopen in Container" |
| 99 | + - Select it and wait for the container to build and start |
| 100 | + |
| 101 | +2. **Build the project**: |
| 102 | + ```bash |
| 103 | + cmake -B build -DCMAKE_BUILD_TYPE=Release |
| 104 | + cmake --build build --parallel |
| 105 | + ``` |
| 106 | + |
| 107 | +3. **Test NUMA functionality**: |
| 108 | + ```bash |
| 109 | + # Check NUMA topology |
| 110 | + numactl --hardware |
| 111 | + |
| 112 | + # Run with specific NUMA settings |
| 113 | + numactl --cpunodebind=0 --membind=0 ./build/bin/llama-server --model path/to/model.gguf |
| 114 | + ``` |
| 115 | + |
| 116 | +## Available Tools |
| 117 | + |
| 118 | +### System Tools |
| 119 | +- `numactl`: NUMA policy control |
| 120 | +- `hwloc-info`: Hardware locality information |
| 121 | +- `lscpu`: CPU information |
| 122 | +- `ccache`: Compiler cache for faster rebuilds |
| 123 | + |
| 124 | +### Build Configurations |
| 125 | + |
| 126 | +#### Debug Build (default post-create) |
| 127 | +```bash |
| 128 | +cmake -B build -DCMAKE_BUILD_TYPE=Debug |
| 129 | +cmake --build build --parallel |
| 130 | +``` |
| 131 | + |
| 132 | +#### Release Build (optimized) |
| 133 | +```bash |
| 134 | +cmake -B build -DCMAKE_BUILD_TYPE=Release |
| 135 | +cmake --build build --parallel |
| 136 | +``` |
| 137 | + |
| 138 | +#### With Additional Options |
| 139 | +```bash |
| 140 | +# Enable OpenBLAS |
| 141 | +cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS |
| 142 | + |
| 143 | +# Static build |
| 144 | +cmake -B build -DBUILD_SHARED_LIBS=OFF |
| 145 | + |
| 146 | +# Disable CURL if not needed |
| 147 | +cmake -B build -DLLAMA_CURL=OFF |
| 148 | +``` |
| 149 | + |
| 150 | +## Testing NUMA Improvements |
| 151 | + |
| 152 | +The container includes tools to test the NUMA improvements: |
| 153 | + |
| 154 | +### NUMA Topology Detection |
| 155 | +```bash |
| 156 | +# Check current NUMA configuration |
| 157 | +numactl --show |
| 158 | + |
| 159 | +# Display NUMA hardware topology |
| 160 | +numactl --hardware |
| 161 | +``` |
| 162 | + |
| 163 | +### Performance Testing |
| 164 | +```bash |
| 165 | +# Test with default settings (hyperthreading disabled) |
| 166 | +./build/bin/llama-bench -m model.gguf |
| 167 | + |
| 168 | +# Test with hyperthreading |
| 169 | +./build/bin/llama-bench -m model.gguf --cpu-use-hyperthreading |
| 170 | + |
| 171 | +# Test with specific thread count |
| 172 | +./build/bin/llama-bench -m model.gguf --threads 8 |
| 173 | + |
| 174 | +# Test with NUMA binding |
| 175 | +numactl --cpunodebind=0 --membind=0 ./build/bin/llama-bench -m model.gguf |
| 176 | + |
| 177 | +# Test with NUMA mirroring of model weights |
| 178 | +./build/bin/llama-bench -m model.gguf --numa mirror |
| 179 | +``` |
| 180 | + |
| 181 | +### Environment Variables |
| 182 | +```bash |
| 183 | +# Disable hyperthreading via environment |
| 184 | +LLAMA_CPU_NO_HYPERTHREADING=1 ./build/bin/llama-server --model model.gguf |
| 185 | + |
| 186 | +# Disable efficiency cores |
| 187 | +LLAMA_CPU_NO_EFFICIENCY_CORES=1 ./build/bin/llama-server --model model.gguf |
| 188 | +``` |
| 189 | + |
| 190 | +## Development Workflow |
| 191 | + |
| 192 | +1. **Code changes**: Edit files in VS Code with full IntelliSense support |
| 193 | +2. **Build**: Use `Ctrl+Shift+P` → "CMake: Build" or terminal commands |
| 194 | +3. **Debug**: Set breakpoints and use the integrated debugger |
| 195 | +4. **Test**: Run executables directly or through the testing framework |
| 196 | + |
| 197 | +## Troubleshooting |
| 198 | + |
| 199 | +### Container Build Issues |
| 200 | +- Ensure Docker Desktop is running |
| 201 | +- Try rebuilding: `Ctrl+Shift+P` → "Dev Containers: Rebuild Container" |
| 202 | + |
| 203 | +### NUMA Issues |
| 204 | +- Check if running on a NUMA system: `numactl --hardware` |
| 205 | +- Verify CPU topology detection: `lscpu` and `hwloc-info` |
| 206 | +- Test CPU affinity: `taskset -c 0-3 ./your-program` |
| 207 | + |
| 208 | +### Build Issues |
| 209 | +- Clear build cache: `rm -rf build && cmake -B build` |
| 210 | +- Check ccache stats: `ccache -s` |
| 211 | +- Use verbose build: `cmake --build build --verbose` |
0 commit comments