Skip to content

Commit f5e7734

Browse files
authored
ggml-virtgpu: add backend documentation (ggml-org#19354)
* ggml-virtgpu: add backend documentation Assisted-by-AI: Claude Code * CODEOWNERS: add /docs/backend/GGML-VirtGPU/ -> kpouget * README: add the link to docs/backend/GGML-VirtGPU/ggml-virt.md * docs/ggml-virt: add link to testing + configuration * Revert "CODEOWNERS: add /docs/backend/GGML-VirtGPU/ -> kpouget" This reverts commit 8ece8e7. * drop the ggml- prefix * s/ggerganov/ggml-org * Relocate VirtGPU.md * reorganize the text * turn turn the ascii diagram into a mermaid * README.md: update the link to the main doc
1 parent 1e8924f commit f5e7734

File tree

4 files changed

+575
-0
lines changed

4 files changed

+575
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
288288
| [WebGPU [In Progress]](docs/build.md#webgpu) | All |
289289
| [RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc) | All |
290290
| [Hexagon [In Progress]](docs/backend/hexagon/README.md) | Snapdragon |
291+
| [VirtGPU](docs/backend/VirtGPU.md) | VirtGPU APIR |
291292

292293
## Obtaining and quantizing models
293294

docs/backend/VirtGPU.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# GGML-VirtGPU Backend
2+
3+
The GGML-VirtGPU backend enables GGML applications to run machine
4+
learning computations on host hardware while the application itself
5+
runs inside a virtual machine. It uses host-guest shared memory to
6+
efficiently share data buffers between the two sides.
7+
8+
This backend relies on the virtio-gpu, and VirglRenderer API Remoting
9+
(APIR) component. The backend is split into two libraries:
10+
- a GGML implementation (the "remoting frontend"), running in the
11+
guest and interacting with the virtgpu device
12+
- a VirglRenderer APIR compatible library (the "remoting backend"),
13+
running in the host and interacting with Virglrenderer and an actual
14+
GGML device backend.
15+
16+
## OS support
17+
18+
| OS | Status | Backend | CI testing | Notes
19+
| -------- | ----------------- | ----------- | ----------- | -----
20+
| MacOS 14 | Supported | ggml-metal | X | Working when compiled on MacOS 14
21+
| MacOS 15 | Supported | ggml-metal | X | Working when compiled on MacOS 14 or MacOS 15
22+
| MacOS 26 | Not tested | | |
23+
| Linux | Under development | ggml-vulkan | not working | Working locally, CI running into deadlocks
24+
25+
26+
## Architecture Overview
27+
28+
The GGML-VirtGPU backend consists of three main components:
29+
30+
```mermaid
31+
graph TD
32+
%% Nodes
33+
34+
subgraph GuestVM ["Guest VM - Frontend"]
35+
App([GGML Application<br/>llama.cpp, etc.])
36+
37+
direction TB
38+
Interface[GGML Backend Interface]
39+
Comm["GGML-VirtGPU<br/>(hypercalls + shared mem)"]
40+
41+
App --> Interface
42+
Interface --> Comm
43+
end
44+
45+
API[virtio-gpu / virglrenderer API]
46+
47+
subgraph HostSystem [Host System - Backend]
48+
direction TB
49+
Dispatcher[GGML-VirtGPU-Backend]
50+
BackendLib[GGML Backend library<br/>Metal / Vulkan / CPU / ...]
51+
52+
Dispatcher --> BackendLib
53+
end
54+
55+
%% Connections
56+
Comm --> API
57+
API --> HostSystem
58+
```
59+
60+
### Key Components
61+
62+
1. **Guest-side Frontend** (`ggml-virtgpu/`): Implements the GGML backend interface and forwards operations to the host
63+
2. **Host-side Backend** (`ggml-virtgpu/backend/`): Receives forwarded operations and executes them on actual hardware backends
64+
3. **Communication Layer**: Uses virtio-gpu hypercalls and shared memory for efficient data transfer
65+
66+
## Features
67+
68+
- **Dynamic backend loading** on the host side (CPU, CUDA, Metal, etc.)
69+
- **Zero-copy data transfer** via host-guest shared memory pages
70+
71+
## Communication Protocol
72+
73+
### Hypercalls and Shared Memory
74+
75+
The backend uses two primary communication mechanisms:
76+
77+
1. **Hypercalls (`DRM_IOCTL_VIRTGPU_EXECBUFFER`)**: Trigger remote execution from guest to host
78+
2. **Shared Memory Pages**: Zero-copy data transfer for tensors and parameters
79+
80+
#### Shared Memory Layout
81+
82+
Each connection uses two shared memory buffers:
83+
84+
- **Data Buffer** (24 MiB): For command/response data and tensor transfers
85+
- **Reply Buffer** (16 KiB): For command replies and status information
86+
- **Data Buffers**: Dynamically allocated host-guest shared buffers
87+
served as GGML buffers.
88+
89+
### APIR Protocol
90+
91+
The Virglrender API Remoting protocol defines three command types:
92+
93+
- `HANDSHAKE`: Protocol version negotiation and capability discovery
94+
- `LOADLIBRARY`: Dynamic loading of backend libraries on the host
95+
- `FORWARD`: API function call forwarding
96+
97+
### Binary Serialization
98+
99+
Commands and data are serialized using a custom binary protocol with:
100+
101+
- Fixed-size encoding for basic types
102+
- Variable-length arrays with size prefixes
103+
- Buffer bounds checking
104+
- Error recovery mechanisms
105+
106+
## Supported Operations
107+
108+
### Device Operations
109+
- Device enumeration and capability queries
110+
- Memory information (total/free)
111+
- Backend type detection
112+
113+
### Buffer Operations
114+
- Buffer allocation and deallocation
115+
- Tensor data transfer (host ↔ guest)
116+
- Memory copying and clearing
117+
118+
### Computation Operations
119+
- Graph execution forwarding
120+
121+
## Build Requirements
122+
123+
### Guest-side Dependencies
124+
- `libdrm` for DRM/virtio-gpu communication
125+
- C++20 compatible compiler
126+
- CMake 3.14+
127+
128+
### Host-side Dependencies
129+
- virglrenderer with APIR support (pending upstream review)
130+
- Target backend libraries (libggml-metal, libggml-vulkan, etc.)
131+
132+
## Configuration
133+
134+
### Environment Variables
135+
136+
- `GGML_VIRTGPU_BACKEND_LIBRARY`: Path to the host-side backend library
137+
- `GGML_VIRTGPU_DEBUG`: Enable debug logging
138+
139+
### Build Options
140+
141+
- `GGML_VIRTGPU`: Enable the VirtGPU backend (`ON` or `OFF`, default: `OFF`)
142+
- `GGML_VIRTGPU_BACKEND`: Build the host-side backend component (`ON`, `OFF` or `ONLY`, default: `OFF`)
143+
144+
### System Requirements
145+
146+
- VM with virtio-gpu support
147+
- VirglRenderer with APIR patches
148+
- Compatible backend libraries on host
149+
150+
## Limitations
151+
152+
- **VM-specific**: Only works in virtual machines with virtio-gpu support
153+
- **Host dependency**: Requires properly configured host-side backend
154+
- **Latency**: Small overhead from VM escaping for each operation
155+
156+
157+
* This work is pending upstream changes in the VirglRenderer
158+
project.
159+
* The backend can be tested with Virglrenderer compiled from source
160+
using this PR:
161+
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1590
162+
* This work is pending changes in the VMM/hypervisor running the
163+
virtual machine, which need to know how to route the newly
164+
introduced APIR capset.
165+
* The environment variable `VIRGL_ROUTE_VENUS_TO_APIR=1` allows
166+
using the Venus capset, until the relevant hypervisors have been
167+
patched. However, setting this flag breaks the Vulkan/Venus normal
168+
behavior.
169+
* The environment variable `GGML_REMOTING_USE_APIR_CAPSET` tells the
170+
`ggml-virtgpu` backend to use the APIR capset. This will become
171+
the default when the relevant hypervisors have been patched.
172+
173+
* This work focused on improving the performance of llama.cpp running
174+
on MacOS containers, and is mainly tested on this platform. The
175+
linux support (via `krun`) is in progress.
176+
177+
## See Also
178+
179+
- [Development and Testing](VirtGPU/development.md)
180+
- [Backend configuration](VirtGPU/configuration.md)
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# GGML-VirtGPU Backend Configuration
2+
3+
This document describes the environment variables used by the ggml-virtgpu backend system, covering both the frontend (guest-side) and backend (host-side) components.
4+
5+
## Environment Variables Overview
6+
7+
The ggml-virtgpu backend uses environment variables for configuration across three main components:
8+
- **Frontend (Guest)**: GGML applications running in VMs
9+
- **Hypervisor**: Virglrenderer/APIR system
10+
- **Backend (Host)**: Host-side GGML backend integration
11+
12+
## Frontend (Guest-side) Configuration
13+
14+
### GGML_REMOTING_USE_APIR_CAPSET
15+
- **Location**: `ggml/src/ggml-virtgpu/virtgpu.cpp`
16+
- **Type**: Boolean flag (presence-based)
17+
- **Purpose**: Controls which virtio-gpu capability set to use for communication
18+
- **Values**:
19+
- Set (any value): Use the APIR capset (long-term setup)
20+
- Unset: Use the Venus capset (easier for testing with an unmodified hypervisor)
21+
- **Default**: Unset (Venus capset)
22+
- **Usage**:
23+
```bash
24+
export GGML_REMOTING_USE_APIR_CAPSET=1 # Use APIR capset
25+
# or leave unset for Venus capset
26+
```
27+
28+
## Hypervisor (Virglrenderer/APIR) Configuration
29+
30+
These environment variables are used during the transition phase for
31+
running with an unmodified hypervisor (not supporting the
32+
VirglRenderer APIR component). They will be removed in the future, and
33+
the hypervisor will instead configure VirglRenderer with the APIR
34+
_Configuration Key_.
35+
36+
### VIRGL_APIR_BACKEND_LIBRARY
37+
- **Location**: `virglrenderer/src/apir/apir-context.c`
38+
- **Configuration Key**: `apir.load_library.path`
39+
- **Type**: File path string
40+
- **Purpose**: Path to the APIR backend library that virglrenderer should dynamically load
41+
- **Required**: Yes
42+
- **Example**:
43+
```bash
44+
export VIRGL_APIR_BACKEND_LIBRARY="/path/to/libggml-remotingbackend.so"
45+
```
46+
47+
### VIRGL_ROUTE_VENUS_TO_APIR
48+
- **Location**: `virglrenderer/src/apir/apir-renderer.h`
49+
- **Type**: Boolean flag (presence-based)
50+
- **Purpose**: Temporary workaround to route Venus capset calls to APIR during hypervisor transition period
51+
- **Status**: will be removed once hypervisors support APIR natively
52+
- **Warning**: Breaks normal Vulkan/Venus functionality
53+
- **Usage**:
54+
```bash
55+
export VIRGL_ROUTE_VENUS_TO_APIR=1 # For testing with an unmodified hypervisor
56+
```
57+
58+
### VIRGL_APIR_LOG_TO_FILE
59+
- **Location**: `virglrenderer/src/apir/apir-renderer.c`
60+
- **Environment Variable**: `VIRGL_APIR_LOG_TO_FILE`
61+
- **Type**: File path string
62+
- **Purpose**: Enable debug logging from the VirglRenderer APIR component to specified file
63+
- **Required**: No (optional debugging)
64+
- **Default**: Logging to `stderr`
65+
- **Usage**:
66+
```bash
67+
export VIRGL_APIR_LOG_TO_FILE="/tmp/apir-debug.log"
68+
```
69+
70+
## Backend (Host-side) Configuration
71+
72+
These environment variables are used during the transition phase for
73+
running with an unmodified hypervisor (not supporting the
74+
VirglRenderer APIR component). They will be removed in the future, and
75+
the hypervisor will instead configure VirglRenderer with the APIR
76+
_Configuration Key_.
77+
78+
### APIR_LLAMA_CPP_GGML_LIBRARY_PATH
79+
- **Location**: `ggml/src/ggml-virtgpu/backend/backend.cpp`
80+
- **Environment Variable**: `APIR_LLAMA_CPP_GGML_LIBRARY_PATH`
81+
- **Configuration Key**: `ggml.library.path`
82+
- **Type**: File path string
83+
- **Purpose**: Path to the actual GGML backend library (Metal, CUDA, Vulkan, etc.)
84+
- **Required**: **Yes** - backend initialization fails without this
85+
- **Examples**:
86+
```bash
87+
# macOS with Metal backend
88+
export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="/opt/llama.cpp/lib/libggml-metal.dylib"
89+
90+
# Linux with CUDA backend
91+
export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="/opt/llama.cpp/lib/libggml-cuda.so"
92+
93+
# macOS or Linux with Vulkan backend
94+
export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="/opt/llama.cpp/lib/libggml-vulkan.so"
95+
```
96+
97+
### APIR_LLAMA_CPP_GGML_LIBRARY_REG
98+
- **Location**: `ggml/src/ggml-virtgpu/backend/backend.cpp`
99+
- **Environment Variable**: `APIR_LLAMA_CPP_GGML_LIBRARY_REG`
100+
- **Configuration Key**: `ggml.library.reg`
101+
- **Type**: Function symbol name string
102+
- **Purpose**: Name of the backend registration function to call after loading the library
103+
- **Required**: No (defaults to `ggml_backend_init`)
104+
- **Default**: `ggml_backend_init`
105+
- **Examples**:
106+
```bash
107+
# Metal backend
108+
export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_metal_reg"
109+
110+
# CUDA backend
111+
export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_cuda_reg"
112+
113+
# Vulkan backend
114+
export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_vulkan_reg"
115+
116+
# Generic fallback (default)
117+
# export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_init"
118+
```
119+
120+
### APIR_LLAMA_CPP_LOG_TO_FILE
121+
- **Location**: `ggml/src/ggml-virtgpu/backend/backend.cpp:62`
122+
- **Environment Variable**: `APIR_LLAMA_CPP_LOG_TO_FILE`
123+
- **Type**: File path string
124+
- **Purpose**: Enable debug logging from the GGML backend to specified file
125+
- **Required**: No (optional debugging)
126+
- **Usage**:
127+
```bash
128+
export APIR_LLAMA_CPP_LOG_TO_FILE="/tmp/ggml-backend-debug.log"
129+
```
130+
131+
## Configuration Flow
132+
133+
The configuration system works as follows:
134+
135+
1. **Hypervisor Setup**: Virglrenderer loads the APIR backend library specified by `VIRGL_APIR_BACKEND_LIBRARY`
136+
137+
2. **Context Creation**: When an APIR context is created, it populates a configuration table with environment variables:
138+
- `apir.load_library.path``VIRGL_APIR_BACKEND_LIBRARY`
139+
- `ggml.library.path``APIR_LLAMA_CPP_GGML_LIBRARY_PATH`
140+
- `ggml.library.reg``APIR_LLAMA_CPP_GGML_LIBRARY_REG`
141+
- this step will eventually be performed by the hypervisor itself, with command-line arguments instead of environment variables.
142+
143+
3. **Backend Initialization**: The backend queries the configuration via callbacks:
144+
- `virgl_cbs->get_config(ctx_id, "ggml.library.path")` returns the library path
145+
- `virgl_cbs->get_config(ctx_id, "ggml.library.reg")` returns the registration function
146+
147+
4. **Library Loading**: The backend dynamically loads and initializes the specified GGML library
148+
149+
## Error Messages
150+
151+
Common error scenarios and their messages:
152+
153+
- **Missing library path**: `"cannot open the GGML library: env var 'APIR_LLAMA_CPP_GGML_LIBRARY_PATH' not defined"`
154+
- **Missing registration function**: `"cannot register the GGML library: env var 'APIR_LLAMA_CPP_GGML_LIBRARY_REG' not defined"`
155+
156+
## Example Complete Configuration
157+
158+
Here's an example configuration for a macOS host with Metal backend:
159+
160+
```bash
161+
# Hypervisor environment
162+
export VIRGL_APIR_BACKEND_LIBRARY="/opt/llama.cpp/lib/libggml-virtgpu-backend.dylib"
163+
164+
# Backend configuration
165+
export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="/opt/llama.cpp/lib/libggml-metal.dylib"
166+
export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_metal_reg"
167+
168+
# Optional logging
169+
export VIRGL_APIR_LOG_TO_FILE="/tmp/apir.log"
170+
export APIR_LLAMA_CPP_LOG_TO_FILE="/tmp/ggml.log"
171+
172+
# Guest configuration
173+
export GGML_REMOTING_USE_APIR_CAPSET=1
174+
```

0 commit comments

Comments
 (0)