This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
GGBond is a Python binding library for GGML (Georgi Gerganov's Machine Learning library), built using pybind11. It provides a Pythonic interface to GGML's tensor operations and neural network computation capabilities.
- Languages: Python 3.10+ with C++17 backend
- Build System: CMake with scikit-build-core as Python build backend
- Core Dependency: vendor/ggml (GGML v0.9.4 submodule)
# Development install (rebuilds C++ extension on changes)
pip install -e .
# Standard install
pip install .# Quick smoke test
python examples/simple.py
# GPT-2 inference (requires model file)
python examples/gpt2.pyGGBond provides three layers of abstraction:
A near 1:1 mapping of the GGML C API exposed via pybind11 (src/ggmllib.cpp).
- GGML opaque pointers are wrapped in
TypedPtr<Tag>template (e.g.,ContextPtr,TensorPtr,BackendPtr) for type safety — each becomes a distinct Python type (ggml.Context,ggml.Tensor, etc.) - Use
as<T>(ptr)helper to extract the underlying C pointer from aTypedPtr REGISTER_PTRmacro registers each pointer type with__repr__and__bool__- Enums (
ggml_type,ggml_status) are bound as Python enums - Metal backend functions are behind
#ifdef __APPLE__— not available on Linux - Functions follow GGML naming conventions (e.g.,
new_tensor_2d,mul_mat) - ggbond/ggml.pyi: Type stubs — must be updated whenever new bindings are added
Object-oriented wrappers (ggbond/backend.py, context.py, graph.py) around GGML primitives with lifecycle management (close() / context manager). They simplify common patterns but are not a complete 1:1 equivalent of the raw API:
Backend(backend.py): Wraps CPU/Metal/HIP backend init, providesalloc_ctx(),compute(),tensor_set(),tensor_get()Context(context.py): Wrapsggml_contextwith auto-sized memory fromn_tensors, providesnew_tensor()dispatching tonew_tensor_{1..4}dGraph(graph.py): Owns its ownContextfor graph ops, exposes convenience methods (g.add(a, b)) and direct access viag.ctxGAllocr(graph.py): Wrapsggml_gallocrfor graph memory allocation
These are primarily building blocks for the higher-level API.
The recommended entry point for most use cases.
Session(session.py): Owns a backend and manages all resource lifetimes (contexts, buffers). Creates tensors vias.tensor(data)and loads GGUF models vias.load_gguf(path).Tensor(tensor.py): Lazy-evaluated tensor bound to a session. Operations (+,-,*,/,@,.relu(),.softmax(), etc.) build a computation graph. Materialized on.compute()or.numpy().
s = ggbond.Session("cpu")
a = s.tensor(np.array([[1, 2], [3, 4]], dtype=np.float32))
b = s.tensor(np.array([[5, 6], [7, 8]], dtype=np.float32))
print((a @ b).numpy())
s.close()Key implementation details:
Tensor._from_ggml(): Creates leaf nodes backed by backend-resident ggml tensorsTensor._from_op(): Creates lazy op nodes that record the operation + inputsTensor.compute(): Topological sorts the DAG, builds aGraph, allocates viaGAllocr, executes on the backend- Shape is stored in GGML order
(ne0, ne1, ...),.numpy()reverses it automatically - GGUF model weights are loaded directly onto the target backend without intermediate copies
| File | Purpose |
|---|---|
src/ggmllib.cpp |
pybind11 bindings (C++ → Python) |
ggbond/__init__.py |
Package init, exports Session, Tensor, ggml |
ggbond/session.py |
High-level Session class |
ggbond/tensor.py |
Lazy Tensor with operator overloading |
ggbond/backend.py |
Backend wrapper (CPU/Metal/HIP) |
ggbond/context.py |
Context wrapper |
ggbond/graph.py |
Graph and GAllocr wrappers |
ggbond/gguf.py |
GGUF model file loading |
ggbond/ggml.pyi |
Type stubs for the C extension |
vendor/ggml/ |
GGML library source (git submodule) |
GGML uses a two-phase computation model (relevant for Layer 1 & 2 usage):
- Metadata Definition: Create contexts with
no_alloc=True, define tensor shapes and build computation graph - Allocation & Execution: Init backend → allocate memory via
GAllocrorbackend_alloc_ctx_tensors→ set data → compute → get results
The Tensor high-level API (Layer 3) handles both phases automatically.
- Add binding in src/ggmllib.cpp
- Update ggbond/ggml.pyi: Add corresponding type stub
- Rebuild:
pip install -e . - If the op should be available on
Tensor, add it totensor.py(_UNARY_OPS,_BINARY_OPS, or as a method)
- Add the ggml dispatch in
_materialize_op()intensor.py - Add shape inference in
_infer_shape()if the output shape differs from input - Add a method on
Tensorclass (or add to_UNARY_OPS/_BINARY_OPS/_REDUCTION_OPStuples for simple ops)
- CPU:
Backend("cpu")orSession("cpu")— always available - Metal:
Backend("metal")orSession("metal")— macOS only, GPU acceleration - HIP:
Backend("hip")/Backend("rocm")orSession("hip")— requires build withGGML_HIP=ON