|
| 1 | +# Building llama.cpp for AMD Instinct MI50 (GFX906) |
| 2 | + |
| 3 | +This guide provides specific instructions for building llama.cpp with optimizations for AMD Instinct MI50 GPUs (gfx906 architecture). |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +1. **ROCm Installation** (5.7+ recommended, 6.x supported) |
| 8 | + ```bash |
| 9 | + # Ubuntu/Debian |
| 10 | + wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add - |
| 11 | + echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list |
| 12 | + sudo apt update |
| 13 | + sudo apt install rocm-dev hipblas rocblas |
| 14 | + |
| 15 | + # Add user to video/render groups |
| 16 | + sudo usermod -a -G video,render $USER |
| 17 | + # Logout and login for group changes to take effect |
| 18 | + ``` |
| 19 | + |
| 20 | +2. **Build Tools** |
| 21 | + ```bash |
| 22 | + sudo apt install cmake build-essential git |
| 23 | + ``` |
| 24 | + |
| 25 | +## Quick Build Instructions |
| 26 | + |
| 27 | +```bash |
| 28 | +# Clone the repository |
| 29 | +git clone https://github.com/skyne98/llama.cpp-gfx906.git |
| 30 | +cd llama.cpp-gfx906 |
| 31 | + |
| 32 | +# CRITICAL: Initialize the ggml-gfx906 submodule |
| 33 | +git submodule update --init --recursive |
| 34 | + |
| 35 | +# Build with GFX906 optimizations |
| 36 | +cmake -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx906 -DCMAKE_BUILD_TYPE=Release |
| 37 | +cmake --build build --config Release -j $(nproc) |
| 38 | +``` |
| 39 | + |
| 40 | +## Verify GPU Detection |
| 41 | + |
| 42 | +After building, verify that your MI50 is properly detected: |
| 43 | + |
| 44 | +```bash |
| 45 | +# Check ROCm detection |
| 46 | +rocm-smi |
| 47 | + |
| 48 | +# Test with a model |
| 49 | +./build/bin/llama-cli -m path/to/model.gguf -p "Hello" -n 10 |
| 50 | +``` |
| 51 | + |
| 52 | +## Performance Optimizations |
| 53 | + |
| 54 | +The ggml-gfx906 fork includes specific optimizations for MI50: |
| 55 | + |
| 56 | +### Hardware Instructions Used |
| 57 | +- **V_DOT4_I32_I8**: 4x INT8 dot product operations |
| 58 | +- **V_DOT2_F32_F16**: 2x FP16 dot product operations |
| 59 | +- **V_PK_FMA_F16**: Dual FP16 FMA operations |
| 60 | +- **DS_PERMUTE/BPERMUTE**: Hardware lane shuffling |
| 61 | + |
| 62 | +### Expected Performance Improvements |
| 63 | +- Q8_0 quantization: ~40% improvement over baseline |
| 64 | +- Q4_0 quantization: ~55% improvement over baseline |
| 65 | +- Flash Attention: ~35% improvement |
| 66 | +- Memory bandwidth: Up to 900 GB/s (HBM2) |
| 67 | + |
| 68 | +## Docker Build |
| 69 | + |
| 70 | +For consistent builds, use the provided Docker configuration: |
| 71 | + |
| 72 | +```bash |
| 73 | +# Build Docker image |
| 74 | +docker build -f Dockerfile.gfx906 -t llama-gfx906 . |
| 75 | + |
| 76 | +# Run with GPU support |
| 77 | +docker run --rm -it \ |
| 78 | + --device=/dev/kfd \ |
| 79 | + --device=/dev/dri \ |
| 80 | + --security-opt seccomp=unconfined \ |
| 81 | + --group-add video \ |
| 82 | + -v ./models:/models \ |
| 83 | + llama-gfx906 \ |
| 84 | + ./bin/llama-cli -m /models/your-model.gguf -p "Test" -n 20 |
| 85 | +``` |
| 86 | + |
| 87 | +## Troubleshooting |
| 88 | + |
| 89 | +### Missing ggml files during build |
| 90 | +```bash |
| 91 | +# Ensure submodule is initialized |
| 92 | +git submodule update --init --recursive |
| 93 | +``` |
| 94 | + |
| 95 | +### GPU not detected |
| 96 | +```bash |
| 97 | +# Check GPU visibility |
| 98 | +rocm-smi |
| 99 | +export HIP_VISIBLE_DEVICES=0 # Use first GPU |
| 100 | +``` |
| 101 | + |
| 102 | +### Build errors with HIP |
| 103 | +```bash |
| 104 | +# Set explicit paths if needed |
| 105 | +export HIPCXX="$(hipconfig -l)/clang" |
| 106 | +export HIP_PATH="$(hipconfig -R)" |
| 107 | +``` |
| 108 | + |
| 109 | +## Development |
| 110 | + |
| 111 | +The GFX906 optimizations are implemented in the [ggml-gfx906 fork](https://github.com/skyne98/ggml-gfx906). To contribute: |
| 112 | + |
| 113 | +1. Work on optimizations in the ggml fork |
| 114 | +2. Test changes locally |
| 115 | +3. Update the submodule reference in llama.cpp |
| 116 | + |
| 117 | +See the [ggml-gfx906 issues](https://github.com/skyne98/ggml-gfx906/issues) for ongoing optimization work. |
| 118 | + |
| 119 | +## Related Documentation |
| 120 | + |
| 121 | +- [Main build documentation](./build.md) |
| 122 | +- [Docker documentation](./docker.md) |
| 123 | +- [GFX906 optimization plan](../docs/gfx906/optimization_plan.md) |
| 124 | +- [Implementation guide](../docs/gfx906/implementation_guide.md) |
0 commit comments