Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md

Repository files navigation

llama-cpp-python GPU Wheel for Python 3.14 (CUDA 13.1)

Fully working GPU-accelerated wheel for llama-cpp-python==0.3.16 on Python 3.14 (Windows amd64).

Built December 17, 2025 with:

CUDA Toolkit 13.1 (latest)
Full CUDA graph support
Tested: ~85 tokens/second on Llama 3 8B Q4_K_M (RTX 3090)

https://github.com/aivrar/llama-cpp-python-py314-cuda131-wheel/releases/tag/v0.3.16-cuda13.1-py3.14

Install

pip install llama_cpp_python-0.3.16-cp314-cp314-win_amd64.whl

About

GPU-accelerated llama-cpp-python 0.3.16 wheel for Python 3.14 (CUDA 13.1, Windows)

machine-learning gpu cuda inference pytorch artificial-intelligence gpu-acceleration cuda-toolkit python-wheel multimodal nvdia large-language-models llm langchain llama-cpp local-llm llama-cpp-python gguf openai-compatible

Report repository

Releases 1

llama-cpp-python 0.3.16 - Python 3.14 + CUDA 13.1 GPU Wheel Latest

Packages

Contributors