v0.3.16-cu128-AVX2-win-20250831
·
25 commits
to main
since this release
feat: Update Submodule vendor/llama.cpp 6c442f4..bbbf5ec
feat: Sync llama : remove KV cache defragmentation logic
feat: Sync model : jina-embeddings-v3 support
feat: Sync llama: use FA + max. GPU layers by default, the flash_attn parameter in context_params has been deleted and replaced by flash_attn_type as the default auto initialization parameter.