Skip to content

Releases: JamePeng/llama-cpp-python

v0.3.16-cu124-AVX2-win-20250913

13 Sep 09:32

Choose a tag to compare

v0.3.16-cu124-AVX2-linux-20250913

13 Sep 06:13

Choose a tag to compare

v0.3.16-cu128-AVX2-win-20250831

31 Aug 18:35

Choose a tag to compare

feat: Update Submodule vendor/llama.cpp 6c442f4..bbbf5ec
feat: Sync llama : remove KV cache defragmentation logic
feat: Sync model : jina-embeddings-v3 support
feat: Sync llama: use FA + max. GPU layers by default, the flash_attn parameter in context_params has been deleted and replaced by flash_attn_type as the default auto initialization parameter.

v0.3.16-cu128-AVX2-linux-20250831

31 Aug 14:27

Choose a tag to compare

feat: Update Submodule vendor/llama.cpp 6c442f4..bbbf5ec
feat: Sync llama : remove KV cache defragmentation logic
feat: Sync model : jina-embeddings-v3 support
feat: Sync llama: use FA + max. GPU layers by default, the flash_attn parameter in context_params has been deleted and replaced by flash_attn_type as the default auto initialization parameter.

v0.3.16-cu126-AVX2-win-20250831

31 Aug 17:27

Choose a tag to compare

feat: Update Submodule vendor/llama.cpp 6c442f4..bbbf5ec
feat: Sync llama : remove KV cache defragmentation logic
feat: Sync model : jina-embeddings-v3 support
feat: Sync llama: use FA + max. GPU layers by default, the flash_attn parameter in context_params has been deleted and replaced by flash_attn_type as the default auto initialization parameter.

v0.3.16-cu126-AVX2-linux-20250831

31 Aug 14:16

Choose a tag to compare

feat: Update Submodule vendor/llama.cpp 6c442f4..bbbf5ec
feat: Sync llama : remove KV cache defragmentation logic
feat: Sync model : jina-embeddings-v3 support
feat: Sync llama: use FA + max. GPU layers by default, the flash_attn parameter in context_params has been deleted and replaced by flash_attn_type as the default auto initialization parameter.

v0.3.16-cu124-AVX2-win-20250831

31 Aug 17:20

Choose a tag to compare

feat: Update Submodule vendor/llama.cpp 6c442f4..bbbf5ec
feat: Sync llama : remove KV cache defragmentation logic
feat: Sync model : jina-embeddings-v3 support
feat: Sync llama: use FA + max. GPU layers by default, the flash_attn parameter in context_params has been deleted and replaced by flash_attn_type as the default auto initialization parameter.

v0.3.16-cu124-AVX2-linux-20250831

31 Aug 14:15

Choose a tag to compare

feat: Update Submodule vendor/llama.cpp 6c442f4..bbbf5ec
feat: Sync llama : remove KV cache defragmentation logic
feat: Sync model : jina-embeddings-v3 support
feat: Sync llama: use FA + max. GPU layers by default, the flash_attn parameter in context_params has been deleted and replaced by flash_attn_type as the default auto initialization parameter.

v0.3.16-cu128-AVX2-win-20250822

22 Aug 03:05

Choose a tag to compare

v0.3.16-cu128-AVX2-linux-20250821

21 Aug 22:57

Choose a tag to compare