Releases · JamePeng/llama-cpp-python · GitHub

31 Oct 19:37

v0.3.16-cu126-AVX2-win-20251031

v0.3.16-cu126-AVX2-win-20251031

New Update: Support for Qwen3VL GGUF

feat: Update README.md for Qwen3VL example(Thinking/No Thinking)
feat: feat: Add Qwen3VLChatHandler into llama_chat_format.py
feat: Update llama.cpp api 20251031
update: Update Submodule vendor/llama.cpp 16724b5..8da3c0e

Assets 6

31 Oct 15:47

v0.3.16-cu126-AVX2-linux-20251031

v0.3.16-cu126-AVX2-linux-20251031

New Update: Support for Qwen3VL GGUF

feat: Update README.md for Qwen3VL example(Thinking/No Thinking)
feat: feat: Add Qwen3VLChatHandler into llama_chat_format.py
feat: Update llama.cpp api 20251031
update: Update Submodule vendor/llama.cpp 16724b5..8da3c0e

Assets 6

31 Oct 19:34

v0.3.16-cu124-AVX2-win-20251031

v0.3.16-cu124-AVX2-win-20251031

New Update: Support for Qwen3VL GGUF

feat: Update README.md for Qwen3VL example(Thinking/No Thinking)
feat: feat: Add Qwen3VLChatHandler into llama_chat_format.py
feat: Update llama.cpp api 20251031
update: Update Submodule vendor/llama.cpp 16724b5..8da3c0e

Assets 6

31 Oct 15:47

v0.3.16-cu124-AVX2-linux-20251031

v0.3.16-cu124-AVX2-linux-20251031

New Update: Support for Qwen3VL GGUF

feat: Update README.md for Qwen3VL example(Thinking/No Thinking)
feat: feat: Add Qwen3VLChatHandler into llama_chat_format.py
feat: Update llama.cpp api 20251031
update: Update Submodule vendor/llama.cpp 16724b5..8da3c0e

Assets 6

23 Oct 22:07

v0.3.16-cu128-AVX2-win-20251023

v0.3.16-cu128-AVX2-win-20251024

feat: Supplement sm_87 sm101 compilation
feat: Update Submodule vendor/llama.cpp df1b612..dd62dcf
feat: Update some llama model parameters(check_tensors, use_extra_bufts, no_host)
feat: Sync model : Granite docling + Idefics3 preprocessing (SmolVLM)
feat: Sync server : context checkpointing for hybrid and recurrent models
feat: Sync llama: print memory breakdown on exit
feat: Synchronize some enum variable values
feat: Introducing index numbers to avoid the hallucination problem of multiple images entering the minicpm multimodal model series as much as possible

Assets 6

23 Oct 19:43

v0.3.16-cu128-AVX2-linux-20251023

v0.3.16-cu128-AVX2-linux-20251024

feat: Supplement sm_87 sm101 compilation
feat: Update Submodule vendor/llama.cpp df1b612..dd62dcf
feat: Update some llama model parameters(check_tensors, use_extra_bufts, no_host)
feat: Sync model : Granite docling + Idefics3 preprocessing (SmolVLM)
feat: Sync server : context checkpointing for hybrid and recurrent models
feat: Sync llama: print memory breakdown on exit
feat: Synchronize some enum variable values
feat: Introducing index numbers to avoid the hallucination problem of multiple images entering the minicpm multimodal model series as much as possible

Assets 6

24 Oct 00:01

v0.3.16-cu126-AVX2-win-20251024

v0.3.16-cu126-AVX2-win-20251024

feat: Supplement sm_87 compilation
feat: Update Submodule vendor/llama.cpp df1b612..dd62dcf
feat: Update some llama model parameters(check_tensors, use_extra_bufts, no_host)
feat: Sync model : Granite docling + Idefics3 preprocessing (SmolVLM)
feat: Sync server : context checkpointing for hybrid and recurrent models
feat: Sync llama: print memory breakdown on exit
feat: Synchronize some enum variable values
feat: Introducing index numbers to avoid the hallucination problem of multiple images entering the minicpm multimodal model series as much as possible

Assets 6

23 Oct 19:27

v0.3.16-cu126-AVX2-linux-20251023

v0.3.16-cu126-AVX2-linux-20251024

feat: Supplement sm_87 compilation
feat: Update Submodule vendor/llama.cpp df1b612..dd62dcf
feat: Update some llama model parameters(check_tensors, use_extra_bufts, no_host)
feat: Sync model : Granite docling + Idefics3 preprocessing (SmolVLM)
feat: Sync server : context checkpointing for hybrid and recurrent models
feat: Sync llama: print memory breakdown on exit
feat: Synchronize some enum variable values
feat: Introducing index numbers to avoid the hallucination problem of multiple images entering the minicpm multimodal model series as much as possible

Assets 6

23 Oct 23:10

v0.3.16-cu124-AVX2-win-20251023

v0.3.16-cu124-AVX2-win-20251024

feat: Supplement sm_87 compilation
feat: Update Submodule vendor/llama.cpp df1b612..dd62dcf
feat: Update some llama model parameters(check_tensors, use_extra_bufts, no_host)
feat: Sync model : Granite docling + Idefics3 preprocessing (SmolVLM)
feat: Sync server : context checkpointing for hybrid and recurrent models
feat: Sync llama: print memory breakdown on exit
feat: Synchronize some enum variable values
feat: Introducing index numbers to avoid the hallucination problem of multiple images entering the minicpm multimodal model series as much as possible

Assets 6

23 Oct 19:24

v0.3.16-cu124-AVX2-linux-20251023

v0.3.16-cu124-AVX2-linux-20251024

feat: Supplement sm_87 compilation
feat: Update Submodule vendor/llama.cpp df1b612..dd62dcf
feat: Update some llama model parameters(check_tensors, use_extra_bufts, no_host)
feat: Sync model : Granite docling + Idefics3 preprocessing (SmolVLM)
feat: Sync server : context checkpointing for hybrid and recurrent models
feat: Sync llama: print memory breakdown on exit
feat: Synchronize some enum variable values
feat: Introducing index numbers to avoid the hallucination problem of multiple images entering the minicpm multimodal model series as much as possible

Assets 6