Releases: chaxu01/llama.cpp
Releases · chaxu01/llama.cpp
b6298
CANN: refactor mask handling and improve performance in FA (#15561) * CANN(flash-attn): refactor mask handling and improve performance 1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode. 2. Optimized performance in non-alibi scenarios by reducing one repeat operation. 3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16. Signed-off-by: noemotiovon <[email protected]> * [CANN]: fix review Signed-off-by: noemotiovon <[email protected]> * [CANN]: Optimization FA BNSD to BSND Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]>
b5891
llama : add jinja template for rwkv-world (#14665) * llama : add jinja template for rwkv-world Signed-off-by: Molly Sophia <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
b5695
llama-chat : fix multiple system message for gemma, orion (#14246)