Releases: mingfeima/llama.cpp
Releases · mingfeima/llama.cpp
b3542
b3441
llama : fix codeshell support (#8599) * llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order
b3401
convert_hf : faster lazy safetensors (#8482) * convert_hf : faster lazy safetensors This makes '--dry-run' much, much faster. * convert_hf : fix memory leak in lazy MoE conversion The '_lazy' queue was sometimes self-referential, which caused reference cycles of objects old enough to avoid garbage collection until potential memory exhaustion.
b2972
CUDA: fix FA out-of-bounds reads (#7479)
b2688
convert : fix autoawq gemma (#6704) * fix autoawq quantized gemma model convert error using autoawq to quantize gemma model will include a lm_head.weight tensor in model-00001-of-00002.safetensors. it result in this situation that convert-hf-to-gguf.py can't map lm_head.weight. skip loading this tensor could prevent this error. * change code to full string match and print necessary message change code to full string match and print a short message to inform users that lm_head.weight has been skipped. --------- Co-authored-by: Zheng.Deng <[email protected]>
b2586
[SYCL] Disable iqx on windows as WA (#6435) * disable iqx on windows as WA * array instead of global_memory
b2542
[SYCL] fix no file in win rel (#6314)