Releases: ReinForce-II/llama.cpp
Releases · ReinForce-II/llama.cpp
b2972
b2961
llama : add phi3 128K model support (#7225) * add phi3 128k support in convert-hf-to-gguf * add phi3 128k support in cuda * address build warnings on llama.cpp * adjust index value in cuda long rope freq factors * add long rope support in ggml cpu backend * make freq factors only depend on ctx size * remove unused rope scaling type 'su' frin gguf converter * fix flint warnings on convert-hf-to-gguf.py * set to the short freq factor when context size is small than trained context size * add one line of comments * metal : support rope freq_factors * ggml : update ggml_rope_ext API to support freq. factors * backends : add dev messages to support rope freq. factors * minor : style * tests : update to use new rope API * backends : fix pragma semicolons * minor : cleanup * llama : move rope factors from KV header to tensors * llama : remove tmp assert * cuda : fix compile warning * convert : read/write n_head_kv * llama : fix uninitialized tensors --------- Co-authored-by: Georgi Gerganov <[email protected]>
b2953
Tokenizer SPM fixes for phi-3 and llama-spm (#7375) * Update brute force test: special tokens * Fix added tokens - Try to read 'added_tokens.json'. - Try to read 'tokenizer_config.json'. - Try to read 'tokenizer.json'. * Fix special tokens rtrim Co-authored-by: Georgi Gerganov <[email protected]> * server : fix test regexes
b2950
rpc : track allocated buffers (#7411) * rpc : track allocated buffers ref: #7407 * rpc : pack rpc_tensor tightly
b2941
Add provisions for windows support for BF16 code including CMake prov…
b2876
llama : disable pipeline parallelism with nkvo (#7265)
b2837
llava : fix moondream support (#7163) * Revert "Revert "llava : add support for moondream vision language model (#6899)"" This reverts commit 9da243b36ac0b9d609adfaaa4c8f1cc8c592f737. * Fix num_positions and embeddings initialization