Releases: AD2605/llama.cpp
Releases · AD2605/llama.cpp
b5503
sampling : make sure samplers return at least 1 token (#13822) * sampling : min-p should always return at least one token ggml-ci * sampling : same for typical sampling * tests : sampling tests use min_keep == 0 ggml-ci
b5467
llama : allow custom list of swa_layers (#13726)
b5432
sycl: disable reorder for sycl mulmat (#13536)
b5423
mtmd : add vision support for llama 4 (#13282) * wip llama 4 conversion * rm redundant __init__ * fix conversion * fix conversion * test impl * try this * reshape patch_embeddings_0 * fix view * rm ffn_post_norm * cgraph ok * f32 for pos embd * add image marker tokens * Llama4UnfoldConvolution * correct pixel shuffle * fix merge conflicts * correct * add debug_graph * logits matched, but it still preceives the image incorrectly * fix style * add image_grid_pinpoints * handle llama 4 preprocessing * rm load_image_size * rm unused line * fix * small fix 2 * add test & docs * fix llava-1.6 test * test: add notion of huge models * add comment * add warn about degraded quality
b5416
CANN: Support MOE Model MUL_MAT_ID (#13042) Signed-off-by: noemotiovon <[email protected]>
b5392
server : proper error handling for missing elements in messages array…
b5359
clip : cap max image size 1024 for qwen vl model (#13478)
b5329
metal : optimize MoE for large batches (#13388) ggml-ci
b5316
server : (webui) fix a very small misalignment (#13387) * server : (webui) fix a very small misalignment * restore font-bold
b5307
docker : disable arm64 and intel images (#13356)