v0.3.16-cu128-AVX2-win-20250831

github-actions released this 31 Aug 18:35

· 25 commits to main since this release

v0.3.16-cu128-AVX2-win-20250831

2be720d

feat: Update Submodule vendor/llama.cpp 6c442f4..bbbf5ec
feat: Sync llama : remove KV cache defragmentation logic
feat: Sync model : jina-embeddings-v3 support
feat: Sync llama: use FA + max. GPU layers by default, the flash_attn parameter in context_params has been deleted and replaced by flash_attn_type as the default auto initialization parameter.

Assets 6