Releases · JamePeng/llama-cpp-python · GitHub

09 Dec 18:01

v0.3.17-cu130-AVX2-win-20251209

v0.3.17-cu130-AVX2-win-20251209 Latest

Latest

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

fixed: Attempting to fix the AVX2 workflow: Missing GGML_FMA and GGML_F16C may cause an OSError: [WinError -1073741795] Windows Error 0xc000001d error on processors that support AVX2 instructions.

feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

09 Dec 17:32

v0.3.17-cu130-AVX2-linux-20251209

v0.3.17-cu130-AVX2-linux-20251209

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

fixed: Attempting to fix the AVX2 workflow: Missing GGML_FMA and GGML_F16C may cause an OSError: [WinError -1073741795] Windows Error 0xc000001d error on processors that support AVX2 instructions.

feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

09 Dec 18:16

v0.3.17-cu128-AVX2-win-20251209

v0.3.17-cu128-AVX2-win-20251209

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

fixed: Attempting to fix the AVX2 workflow: Missing GGML_FMA and GGML_F16C may cause an OSError: [WinError -1073741795] Windows Error 0xc000001d error on processors that support AVX2 instructions.

feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

09 Dec 15:39

v0.3.17-cu128-AVX2-linux-20251209

v0.3.17-cu128-AVX2-linux-20251209

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

fixed: Attempting to fix the AVX2 workflow: Missing GGML_FMA and GGML_F16C may cause an OSError: [WinError -1073741795] Windows Error 0xc000001d error on processors that support AVX2 instructions.

feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

09 Dec 19:12

v0.3.17-cu126-AVX2-win-20251209

v0.3.17-cu126-AVX2-win-20251209

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

fixed: Attempting to fix the AVX2 workflow: Missing GGML_FMA and GGML_F16C may cause an OSError: [WinError -1073741795] Windows Error 0xc000001d error on processors that support AVX2 instructions.

feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

09 Dec 15:11

v0.3.17-cu126-AVX2-linux-20251209

v0.3.17-cu126-AVX2-linux-20251209

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

fixed: Attempting to fix the AVX2 workflow: Missing GGML_FMA and GGML_F16C may cause an OSError: [WinError -1073741795] Windows Error 0xc000001d error on processors that support AVX2 instructions.

feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

09 Dec 19:03

v0.3.17-cu124-AVX2-win-20251209

v0.3.17-cu124-AVX2-win-20251209

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

fixed: Attempting to fix the AVX2 workflow: Missing GGML_FMA and GGML_F16C may cause an OSError: [WinError -1073741795] Windows Error 0xc000001d error on processors that support AVX2 instructions.

feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

09 Dec 15:10

v0.3.17-cu124-AVX2-linux-20251209

v0.3.17-cu124-AVX2-linux-20251209

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

fixed: Attempting to fix the AVX2 workflow: Missing GGML_FMA and GGML_F16C may cause an OSError: [WinError -1073741795] Windows Error 0xc000001d error on processors that support AVX2 instructions.

feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

07 Dec 19:28

v0.3.17-cu130-Basic-win-20251207

v0.3.17-cu130-Basic-win-20251207

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

feat: Update Submodule vendor/llama.cpp d9e03db..0a540f9
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6

07 Dec 16:54

v0.3.17-cu130-Basic-linux-20251207

v0.3.17-cu130-Basic-linux-20251207

feat: perf: optimize LlamaModel.metadata reading performance

Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
Cache ctypes function references to reduce loop overhead.
Repeated model loading can result in a cumulative speed improvement of 1-3%.

feat: Update Submodule vendor/llama.cpp d9e03db..0a540f9
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)

Assets 6