Fully working GPU-accelerated wheel for llama-cpp-python==0.3.16 on Python 3.14 (Windows amd64).
Built December 17, 2025 with:
- CUDA Toolkit 13.1 (latest)
- Full CUDA graph support
- Tested: ~85 tokens/second on Llama 3 8B Q4_K_M (RTX 3090)
https://github.com/aivrar/llama-cpp-python-py314-cuda131-wheel/releases/tag/v0.3.16-cuda13.1-py3.14
pip install llama_cpp_python-0.3.16-cp314-cp314-win_amd64.whl