ggml-org
diff --git a/‎.claude/commands/0-fix-issue.md‎
Lines changed: 14 additions & 0 deletions b/‎.claude/commands/0-fix-issue.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎.claude/commands/1-create-pr.md‎
Lines changed: 5 additions & 0 deletions b/‎.claude/commands/1-create-pr.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎.claude/commands/2-review-failing-pipeline.md‎
Lines changed: 20 additions & 0 deletions b/‎.claude/commands/2-review-failing-pipeline.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 6 additions & 0 deletions b/‎.gitignore‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎.gitmodules‎
Lines changed: 3 additions & 0 deletions b/‎.gitmodules‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 99 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 99 additions & 0 deletions
diff --git a/‎Dockerfile.gfx906‎
Lines changed: 94 additions & 0 deletions b/‎Dockerfile.gfx906‎
Lines changed: 94 additions & 0 deletions
diff --git a/‎Dockerfile.gfx906-test‎
Lines changed: 29 additions & 0 deletions b/‎Dockerfile.gfx906-test‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docker-compose.yml‎
Lines changed: 77 additions & 0 deletions b/‎docker-compose.yml‎
Lines changed: 77 additions & 0 deletions
@@ -0,0 +1,14 @@
+Please analyze and fix the GitHub issue: $ARGUMENTS.
+
+Follow these steps:
+
+0. Create a new branch for the issue
+1. Use `gh issue view` to get the issue details
+2. Understand the problem described in the issue
+3. Search the codebase for relevant files
+4. Implement the necessary changes to fix the issue
+5. Write and run tests to verify the fix
+6. Ensure code passes linting and type checking
+7. Create a descriptive commit message
+
+Remember to use the GitHub CLI (`gh`) for all GitHub-related tasks.
@@ -0,0 +1,5 @@
+# Create Pull Request Command
+
+Ensure the current branch is pushed, if not commit and push changes, and submit a pull request using `gh pr create`.
+
+Do NOT add Claude co-authorship footer to commits or "🤖 Generated with Claude Code" to the content of pull requests.
@@ -0,0 +1,20 @@
+Currently this branch is failing the pipeline.
+
+Please review the PR and associated pipeline and fix the issues.
+
+Use the following commands to review the pipeline:
+
+### How to get the PR number for current branch
+```
+gh pr status
+```
+
+### How to get run ID of the failed job (will need to filter by branch)
+```
+gh run list --branch <branch-name>
+```
+
+### How to get logs of the failed job in the pipeline
+```
+gh run view <run-id> --log-failed
+```
@@ -147,3 +147,9 @@ poetry.toml
 # Local scripts
 /run-vim.sh
 /run-chat.sh
+
+.specstory
+
+# Model files
+models/
+*.gguf
@@ -0,0 +1,3 @@
+[submodule "ggml"]
+	path = ggml
+	url = https://github.com/skyne98/ggml-gfx906
@@ -0,0 +1,99 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Overview
+llama.cpp-gfx906 is a high-performance C/C++ implementation for LLM inference with AMD GFX906 GPU support. This is a specialized fork focusing on AMD GPU architecture.
+
+## Build Commands
+
+### Standard CPU Build
+```bash
+cmake -B build
+cmake --build build --config Release
+```
+
+### AMD GPU Build (GFX906)
+```bash
+cmake -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx906
+cmake --build build --config Release
+```
+
+## Testing
+
+### Run All Tests
+```bash
+cmake -B build -DLLAMA_BUILD_TESTS=ON
+cmake --build build --config Release
+cd build && ctest
+```
+
+### Run Specific Test Categories
+```bash
+ctest -L main     # Main functionality
+ctest -L model    # Model loading
+```
+
+### Run Individual Tests
+```bash
+./build/bin/test-backend-ops
+./build/bin/test-quantize-fns
+./build/bin/test-tokenizer-0 ./models/ggml-vocab-llama-bpe.gguf
+```
+
+## Code Formatting
+Use clang-format for all C/C++ code. The repository follows 4-space indentation (configured in .ecrc).
+
+## Architecture
+
+### Layer Structure
+1. **GGML Layer** (`ggml/`): Low-level tensor operations and backend implementations
+   - `ggml/src/ggml.c`: Core tensor library
+   - `ggml/src/ggml-cuda/`: NVIDIA GPU kernels
+   - `ggml/src/ggml-hip/`: AMD GPU kernels
+   - `ggml/src/ggml-backend.c`: Backend abstraction layer
+
+2. **LLaMA Layer** (`src/`): Model implementation and inference engine
+   - `src/llama.cpp`: Main inference engine - coordinates model loading, context management, and inference
+   - `src/llama-model.*`: Model format handling and weight loading
+   - `src/llama-vocab.*`: Tokenization across different vocab types (BPE, SPM, etc.)
+   - `src/llama-sampling.*`: Sampling strategies (greedy, top-k, top-p, etc.)
+
+3. **Tools Layer** (`tools/`): User-facing applications
+   - `tools/main/`: CLI tool for model inference
+   - `tools/server/`: HTTP server with OpenAI API compatibility
+   - `tools/quantize/`: Model quantization utilities
+
+### Key Design Patterns
+- **Backend Abstraction**: All compute operations go through ggml-backend interface, allowing seamless switching between CPU/CUDA/HIP/Vulkan
+- **Model Format**: Uses GGUF (GGML Universal Format) for model storage with metadata and tensor data
+- **Memory Management**: Custom allocators with mmap support for efficient large model loading
+- **Quantization**: Supports multiple quantization levels (Q4_0, Q5_K_M, etc.) defined in `ggml/include/ggml.h`
+
+## Development Guidelines
+
+### Adding New Features
+- Model architecture additions go in `src/llama.cpp` (search for `llm_load_arch`)
+- New sampling methods belong in `src/llama-sampling.cpp`
+- Backend kernels should be added to respective backend directories under `ggml/src/`
+
+### Before Committing
+1. Run clang-format on modified files
+2. Build with tests enabled and run ctest
+3. Test with both CPU and GPU builds if modifying backend code
+4. Check performance impact with perplexity tool
+
+### Common Development Tasks
+- **Add new model architecture**: Modify `llm_load_arch()` and `llm_build_*()` functions in `src/llama.cpp`
+- **Implement new operator**: Add to `ggml/src/ggml.c` and implement in relevant backends
+- **Add sampling method**: Extend `src/llama-sampling.cpp` with new sampling strategy
+- **Debug tokenization**: Use `tools/test-tokenizer-*.cpp` utilities
+
+## Important Configuration
+- C++17 required
+- CMake 3.14+ required
+- For AMD GPU: ROCm toolkit and HIP compiler required
+- Environment variables:
+  - `HIP_VISIBLE_DEVICES`: Control AMD GPU visibility
+  - `CUDA_VISIBLE_DEVICES`: Control NVIDIA GPU visibility
+  - `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1`: Enable unified memory for CUDA
@@ -0,0 +1,94 @@
+# Optimized Docker image for GFX906 (AMD Instinct MI50) development
+ARG ROCM_VERSION=6.2
+ARG UBUNTU_VERSION=22.04
+
+# Development base with all ROCm tools
+FROM rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION} AS dev-base
+
+# Set GFX906-specific environment
+ENV AMDGPU_TARGETS=gfx906 \
+    HSA_OVERRIDE_GFX_VERSION=9.0.6 \
+    ROCM_PATH=/opt/rocm \
+    HIP_PLATFORM=amd \
+    PATH=${ROCM_PATH}/bin:${ROCM_PATH}/llvm/bin:$PATH \
+    LD_LIBRARY_PATH=${ROCM_PATH}/lib:${ROCM_PATH}/lib64:$LD_LIBRARY_PATH \
+    HIPCC_COMPILE_FLAGS="-O3 -ffast-math -march=native" \
+    HIPCC_LINK_FLAGS="-O3" \
+    HSA_ENABLE_SDMA=0 \
+    GPU_MAX_HW_QUEUES=8 \
+    GPU_NUM_COMPUTE_RINGS=8 \
+    AMD_LOG_LEVEL=3 \
+    HSA_ENABLE_LARGE_BAR=1
+
+# Install development dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    cmake \
+    ninja-build \
+    git \
+    vim \
+    gdb \
+    ccache \
+    python3-pip \
+    python3-dev \
+    rocm-dev \
+    rocm-libs \
+    rocm-utils \
+    roctracer-dev \
+    rocprofiler-dev \
+    && pip3 install --upgrade pip numpy scipy \
+    && rm -rf /var/lib/apt/lists/*
+
+# Set up ccache
+ENV CCACHE_DIR=/workspace/.ccache \
+    CCACHE_MAXSIZE=10G \
+    CMAKE_CXX_COMPILER_LAUNCHER=ccache \
+    CMAKE_C_COMPILER_LAUNCHER=ccache
+
+# Create workspace
+WORKDIR /workspace
+RUN mkdir -p /workspace/llama.cpp-gfx906 /workspace/models /workspace/benchmarks
+
+# Development stage with extra tools
+FROM dev-base AS development
+
+RUN apt-get update && apt-get install -y \
+    clang-format \
+    clang-tidy \
+    tmux \
+    htop \
+    && rm -rf /var/lib/apt/lists/*
+
+VOLUME ["/workspace"]
+CMD ["/bin/bash"]
+
+# Builder stage
+FROM dev-base AS builder
+
+COPY . /workspace/llama.cpp-gfx906/
+WORKDIR /workspace/llama.cpp-gfx906
+
+# Initialize ggml submodule (required for build)
+RUN git submodule update --init --recursive || \
+    (echo "Note: Submodule initialization failed (expected in Docker build)" && \
+     echo "Ensure submodules are initialized before building Docker image")
+
+RUN cmake -B build \
+    -DCMAKE_BUILD_TYPE=Release \
+    -DGGML_HIP=ON \
+    -DAMDGPU_TARGETS=gfx906 \
+    -G Ninja \
+    && cmake --build build --config Release -j$(nproc)
+
+# Runtime stage
+FROM rocm/runtime-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION} AS runtime
+
+ENV HSA_OVERRIDE_GFX_VERSION=9.0.6 \
+    LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
+
+COPY --from=builder /workspace/llama.cpp-gfx906/build/bin/* /usr/local/bin/
+COPY --from=builder /workspace/llama.cpp-gfx906/build/lib/*.so /usr/local/lib/
+
+WORKDIR /models
+VOLUME ["/models"]
+ENTRYPOINT ["/usr/local/bin/llama-cli"]
@@ -0,0 +1,29 @@
+# Quick test Docker image for GFX906
+FROM rocm/dev-ubuntu-22.04:6.2
+
+# Set GFX906 environment
+ENV AMDGPU_TARGETS=gfx906 \
+    HSA_OVERRIDE_GFX_VERSION=9.0.6 \
+    ROCM_PATH=/opt/rocm \
+    PATH=${ROCM_PATH}/bin:$PATH \
+    LD_LIBRARY_PATH=${ROCM_PATH}/lib:${ROCM_PATH}/lib64:$LD_LIBRARY_PATH
+
+# Install minimal dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    cmake \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+
+# Set working directory
+WORKDIR /workspace
+
+# Copy the project
+COPY . /workspace/llama.cpp-gfx906/
+
+# Build the project
+WORKDIR /workspace/llama.cpp-gfx906
+RUN cmake -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx906 && \
+    cmake --build build --config Release -j$(nproc)
+
+CMD ["/bin/bash"]
@@ -37,6 +37,7 @@ Getting started with llama.cpp is straightforward. Here are several ways to inst
 - Run with Docker - see our [Docker documentation](docs/docker.md)
 - Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
 - Build from source by cloning this repository - check out [our build guide](docs/build.md)
+  - **Note:** When building from source, remember to initialize submodules with `git submodule update --init --recursive`
 
 Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
 
 
@@ -0,0 +1,77 @@
+version: '3.8'
+
+services:
+  gfx906-dev:
+    build:
+      context: .
+      dockerfile: Dockerfile.gfx906
+      target: development
+    image: llama-gfx906:dev
+    container_name: llama-gfx906-dev
+    hostname: gfx906-dev
+    
+    # GPU configuration
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri:/dev/dri
+    
+    group_add:
+      - video
+      - render
+    
+    security_opt:
+      - seccomp:unconfined
+    
+    ipc: host
+    network_mode: host
+    shm_size: 16gb
+    
+    volumes:
+      - ./:/workspace/llama.cpp-gfx906:rw
+      - models:/workspace/models:rw
+      - benchmarks:/workspace/benchmarks:rw
+      - ccache:/workspace/.ccache:rw
+    
+    environment:
+      - HSA_OVERRIDE_GFX_VERSION=9.0.6
+      - ROCR_VISIBLE_DEVICES=0
+      - HIP_VISIBLE_DEVICES=0
+      - HSA_ENABLE_LARGE_BAR=1
+      - HSA_FORCE_FINE_GRAIN_PCIE=1
+    
+    stdin_open: true
+    tty: true
+    command: /bin/bash
+
+  gfx906-runtime:
+    build:
+      context: .
+      dockerfile: Dockerfile.gfx906
+      target: runtime
+    image: llama-gfx906:runtime
+    
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri:/dev/dri
+    
+    group_add:
+      - video
+      - render
+    
+    security_opt:
+      - seccomp:unconfined
+    
+    volumes:
+      - models:/models:ro
+    
+    environment:
+      - HSA_OVERRIDE_GFX_VERSION=9.0.6
+      - ROCR_VISIBLE_DEVICES=0
+
+volumes:
+  models:
+    driver: local
+  benchmarks:
+    driver: local
+  ccache:
+    driver: local
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+[submodule "ggml"]`
	`2`	`+ path = ggml`
	`3`	`+ url = https://github.com/skyne98/ggml-gfx906`