Skip to content

Commit d65185a

Browse files
authored
Merge pull request #17 from skyne98/feat/1
feat: Add GFX906 optimization infrastructure and tooling + remove local ggml
2 parents df36bce + 18de1d0 commit d65185a

File tree

625 files changed

+36971
-201937
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

625 files changed

+36971
-201937
lines changed

.claude/commands/0-fix-issue.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
Please analyze and fix the GitHub issue: $ARGUMENTS.
2+
3+
Follow these steps:
4+
5+
0. Create a new branch for the issue
6+
1. Use `gh issue view` to get the issue details
7+
2. Understand the problem described in the issue
8+
3. Search the codebase for relevant files
9+
4. Implement the necessary changes to fix the issue
10+
5. Write and run tests to verify the fix
11+
6. Ensure code passes linting and type checking
12+
7. Create a descriptive commit message
13+
14+
Remember to use the GitHub CLI (`gh`) for all GitHub-related tasks.

.claude/commands/1-create-pr.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Create Pull Request Command
2+
3+
Ensure the current branch is pushed, if not commit and push changes, and submit a pull request using `gh pr create`.
4+
5+
Do NOT add Claude co-authorship footer to commits or "🤖 Generated with Claude Code" to the content of pull requests.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Currently this branch is failing the pipeline.
2+
3+
Please review the PR and associated pipeline and fix the issues.
4+
5+
Use the following commands to review the pipeline:
6+
7+
### How to get the PR number for current branch
8+
```
9+
gh pr status
10+
```
11+
12+
### How to get run ID of the failed job (will need to filter by branch)
13+
```
14+
gh run list --branch <branch-name>
15+
```
16+
17+
### How to get logs of the failed job in the pipeline
18+
```
19+
gh run view <run-id> --log-failed
20+
```

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,3 +147,9 @@ poetry.toml
147147
# Local scripts
148148
/run-vim.sh
149149
/run-chat.sh
150+
151+
.specstory
152+
153+
# Model files
154+
models/
155+
*.gguf

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "ggml"]
2+
path = ggml
3+
url = https://github.com/skyne98/ggml-gfx906

CLAUDE.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Overview
6+
llama.cpp-gfx906 is a high-performance C/C++ implementation for LLM inference with AMD GFX906 GPU support. This is a specialized fork focusing on AMD GPU architecture.
7+
8+
## Build Commands
9+
10+
### Standard CPU Build
11+
```bash
12+
cmake -B build
13+
cmake --build build --config Release
14+
```
15+
16+
### AMD GPU Build (GFX906)
17+
```bash
18+
cmake -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx906
19+
cmake --build build --config Release
20+
```
21+
22+
## Testing
23+
24+
### Run All Tests
25+
```bash
26+
cmake -B build -DLLAMA_BUILD_TESTS=ON
27+
cmake --build build --config Release
28+
cd build && ctest
29+
```
30+
31+
### Run Specific Test Categories
32+
```bash
33+
ctest -L main # Main functionality
34+
ctest -L model # Model loading
35+
```
36+
37+
### Run Individual Tests
38+
```bash
39+
./build/bin/test-backend-ops
40+
./build/bin/test-quantize-fns
41+
./build/bin/test-tokenizer-0 ./models/ggml-vocab-llama-bpe.gguf
42+
```
43+
44+
## Code Formatting
45+
Use clang-format for all C/C++ code. The repository follows 4-space indentation (configured in .ecrc).
46+
47+
## Architecture
48+
49+
### Layer Structure
50+
1. **GGML Layer** (`ggml/`): Low-level tensor operations and backend implementations
51+
- `ggml/src/ggml.c`: Core tensor library
52+
- `ggml/src/ggml-cuda/`: NVIDIA GPU kernels
53+
- `ggml/src/ggml-hip/`: AMD GPU kernels
54+
- `ggml/src/ggml-backend.c`: Backend abstraction layer
55+
56+
2. **LLaMA Layer** (`src/`): Model implementation and inference engine
57+
- `src/llama.cpp`: Main inference engine - coordinates model loading, context management, and inference
58+
- `src/llama-model.*`: Model format handling and weight loading
59+
- `src/llama-vocab.*`: Tokenization across different vocab types (BPE, SPM, etc.)
60+
- `src/llama-sampling.*`: Sampling strategies (greedy, top-k, top-p, etc.)
61+
62+
3. **Tools Layer** (`tools/`): User-facing applications
63+
- `tools/main/`: CLI tool for model inference
64+
- `tools/server/`: HTTP server with OpenAI API compatibility
65+
- `tools/quantize/`: Model quantization utilities
66+
67+
### Key Design Patterns
68+
- **Backend Abstraction**: All compute operations go through ggml-backend interface, allowing seamless switching between CPU/CUDA/HIP/Vulkan
69+
- **Model Format**: Uses GGUF (GGML Universal Format) for model storage with metadata and tensor data
70+
- **Memory Management**: Custom allocators with mmap support for efficient large model loading
71+
- **Quantization**: Supports multiple quantization levels (Q4_0, Q5_K_M, etc.) defined in `ggml/include/ggml.h`
72+
73+
## Development Guidelines
74+
75+
### Adding New Features
76+
- Model architecture additions go in `src/llama.cpp` (search for `llm_load_arch`)
77+
- New sampling methods belong in `src/llama-sampling.cpp`
78+
- Backend kernels should be added to respective backend directories under `ggml/src/`
79+
80+
### Before Committing
81+
1. Run clang-format on modified files
82+
2. Build with tests enabled and run ctest
83+
3. Test with both CPU and GPU builds if modifying backend code
84+
4. Check performance impact with perplexity tool
85+
86+
### Common Development Tasks
87+
- **Add new model architecture**: Modify `llm_load_arch()` and `llm_build_*()` functions in `src/llama.cpp`
88+
- **Implement new operator**: Add to `ggml/src/ggml.c` and implement in relevant backends
89+
- **Add sampling method**: Extend `src/llama-sampling.cpp` with new sampling strategy
90+
- **Debug tokenization**: Use `tools/test-tokenizer-*.cpp` utilities
91+
92+
## Important Configuration
93+
- C++17 required
94+
- CMake 3.14+ required
95+
- For AMD GPU: ROCm toolkit and HIP compiler required
96+
- Environment variables:
97+
- `HIP_VISIBLE_DEVICES`: Control AMD GPU visibility
98+
- `CUDA_VISIBLE_DEVICES`: Control NVIDIA GPU visibility
99+
- `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1`: Enable unified memory for CUDA

Dockerfile.gfx906

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Optimized Docker image for GFX906 (AMD Instinct MI50) development
2+
ARG ROCM_VERSION=6.2
3+
ARG UBUNTU_VERSION=22.04
4+
5+
# Development base with all ROCm tools
6+
FROM rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION} AS dev-base
7+
8+
# Set GFX906-specific environment
9+
ENV AMDGPU_TARGETS=gfx906 \
10+
HSA_OVERRIDE_GFX_VERSION=9.0.6 \
11+
ROCM_PATH=/opt/rocm \
12+
HIP_PLATFORM=amd \
13+
PATH=${ROCM_PATH}/bin:${ROCM_PATH}/llvm/bin:$PATH \
14+
LD_LIBRARY_PATH=${ROCM_PATH}/lib:${ROCM_PATH}/lib64:$LD_LIBRARY_PATH \
15+
HIPCC_COMPILE_FLAGS="-O3 -ffast-math -march=native" \
16+
HIPCC_LINK_FLAGS="-O3" \
17+
HSA_ENABLE_SDMA=0 \
18+
GPU_MAX_HW_QUEUES=8 \
19+
GPU_NUM_COMPUTE_RINGS=8 \
20+
AMD_LOG_LEVEL=3 \
21+
HSA_ENABLE_LARGE_BAR=1
22+
23+
# Install development dependencies
24+
RUN apt-get update && apt-get install -y \
25+
build-essential \
26+
cmake \
27+
ninja-build \
28+
git \
29+
vim \
30+
gdb \
31+
ccache \
32+
python3-pip \
33+
python3-dev \
34+
rocm-dev \
35+
rocm-libs \
36+
rocm-utils \
37+
roctracer-dev \
38+
rocprofiler-dev \
39+
&& pip3 install --upgrade pip numpy scipy \
40+
&& rm -rf /var/lib/apt/lists/*
41+
42+
# Set up ccache
43+
ENV CCACHE_DIR=/workspace/.ccache \
44+
CCACHE_MAXSIZE=10G \
45+
CMAKE_CXX_COMPILER_LAUNCHER=ccache \
46+
CMAKE_C_COMPILER_LAUNCHER=ccache
47+
48+
# Create workspace
49+
WORKDIR /workspace
50+
RUN mkdir -p /workspace/llama.cpp-gfx906 /workspace/models /workspace/benchmarks
51+
52+
# Development stage with extra tools
53+
FROM dev-base AS development
54+
55+
RUN apt-get update && apt-get install -y \
56+
clang-format \
57+
clang-tidy \
58+
tmux \
59+
htop \
60+
&& rm -rf /var/lib/apt/lists/*
61+
62+
VOLUME ["/workspace"]
63+
CMD ["/bin/bash"]
64+
65+
# Builder stage
66+
FROM dev-base AS builder
67+
68+
COPY . /workspace/llama.cpp-gfx906/
69+
WORKDIR /workspace/llama.cpp-gfx906
70+
71+
# Initialize ggml submodule (required for build)
72+
RUN git submodule update --init --recursive || \
73+
(echo "Note: Submodule initialization failed (expected in Docker build)" && \
74+
echo "Ensure submodules are initialized before building Docker image")
75+
76+
RUN cmake -B build \
77+
-DCMAKE_BUILD_TYPE=Release \
78+
-DGGML_HIP=ON \
79+
-DAMDGPU_TARGETS=gfx906 \
80+
-G Ninja \
81+
&& cmake --build build --config Release -j$(nproc)
82+
83+
# Runtime stage
84+
FROM rocm/runtime-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION} AS runtime
85+
86+
ENV HSA_OVERRIDE_GFX_VERSION=9.0.6 \
87+
LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
88+
89+
COPY --from=builder /workspace/llama.cpp-gfx906/build/bin/* /usr/local/bin/
90+
COPY --from=builder /workspace/llama.cpp-gfx906/build/lib/*.so /usr/local/lib/
91+
92+
WORKDIR /models
93+
VOLUME ["/models"]
94+
ENTRYPOINT ["/usr/local/bin/llama-cli"]

Dockerfile.gfx906-test

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Quick test Docker image for GFX906
2+
FROM rocm/dev-ubuntu-22.04:6.2
3+
4+
# Set GFX906 environment
5+
ENV AMDGPU_TARGETS=gfx906 \
6+
HSA_OVERRIDE_GFX_VERSION=9.0.6 \
7+
ROCM_PATH=/opt/rocm \
8+
PATH=${ROCM_PATH}/bin:$PATH \
9+
LD_LIBRARY_PATH=${ROCM_PATH}/lib:${ROCM_PATH}/lib64:$LD_LIBRARY_PATH
10+
11+
# Install minimal dependencies
12+
RUN apt-get update && apt-get install -y \
13+
build-essential \
14+
cmake \
15+
git \
16+
&& rm -rf /var/lib/apt/lists/*
17+
18+
# Set working directory
19+
WORKDIR /workspace
20+
21+
# Copy the project
22+
COPY . /workspace/llama.cpp-gfx906/
23+
24+
# Build the project
25+
WORKDIR /workspace/llama.cpp-gfx906
26+
RUN cmake -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx906 && \
27+
cmake --build build --config Release -j$(nproc)
28+
29+
CMD ["/bin/bash"]

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Getting started with llama.cpp is straightforward. Here are several ways to inst
3737
- Run with Docker - see our [Docker documentation](docs/docker.md)
3838
- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
3939
- Build from source by cloning this repository - check out [our build guide](docs/build.md)
40+
- **Note:** When building from source, remember to initialize submodules with `git submodule update --init --recursive`
4041

4142
Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
4243

docker-compose.yml

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
version: '3.8'
2+
3+
services:
4+
gfx906-dev:
5+
build:
6+
context: .
7+
dockerfile: Dockerfile.gfx906
8+
target: development
9+
image: llama-gfx906:dev
10+
container_name: llama-gfx906-dev
11+
hostname: gfx906-dev
12+
13+
# GPU configuration
14+
devices:
15+
- /dev/kfd:/dev/kfd
16+
- /dev/dri:/dev/dri
17+
18+
group_add:
19+
- video
20+
- render
21+
22+
security_opt:
23+
- seccomp:unconfined
24+
25+
ipc: host
26+
network_mode: host
27+
shm_size: 16gb
28+
29+
volumes:
30+
- ./:/workspace/llama.cpp-gfx906:rw
31+
- models:/workspace/models:rw
32+
- benchmarks:/workspace/benchmarks:rw
33+
- ccache:/workspace/.ccache:rw
34+
35+
environment:
36+
- HSA_OVERRIDE_GFX_VERSION=9.0.6
37+
- ROCR_VISIBLE_DEVICES=0
38+
- HIP_VISIBLE_DEVICES=0
39+
- HSA_ENABLE_LARGE_BAR=1
40+
- HSA_FORCE_FINE_GRAIN_PCIE=1
41+
42+
stdin_open: true
43+
tty: true
44+
command: /bin/bash
45+
46+
gfx906-runtime:
47+
build:
48+
context: .
49+
dockerfile: Dockerfile.gfx906
50+
target: runtime
51+
image: llama-gfx906:runtime
52+
53+
devices:
54+
- /dev/kfd:/dev/kfd
55+
- /dev/dri:/dev/dri
56+
57+
group_add:
58+
- video
59+
- render
60+
61+
security_opt:
62+
- seccomp:unconfined
63+
64+
volumes:
65+
- models:/models:ro
66+
67+
environment:
68+
- HSA_OVERRIDE_GFX_VERSION=9.0.6
69+
- ROCR_VISIBLE_DEVICES=0
70+
71+
volumes:
72+
models:
73+
driver: local
74+
benchmarks:
75+
driver: local
76+
ccache:
77+
driver: local

0 commit comments

Comments
 (0)