Interact with llama-server models
Install this plugin in the same environment as LLM.
llm install llm-llama-serverYou'll need to be running a llama-server on port 8080 to use this plugin.
You can brew install llama.cpp to obtain that binary. Then run it like this:
llama-server -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XLThis loads and serves the unsloth/gemma-3-4b-it-GGUF GGUF version of Gemma 3 4B - a 3.2GB download.
To access regular models from LLM, use the llama-server model:
llm -m llama-server "say hi"For vision models, use llama-server-vision:
llm -m llama-server-vision describe -a path/to/image.pngFor models with tools (which also support vision) use llama-server-tools:
llm -m llama-server-tools -T llm_time 'time?' --tdYou'll need to run the llama-server with the --jinja flag in order for this to work:
llama-server --jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XLOr for a slightly stronger 7.3GB model:
llama-server --jinja -hf unsloth/gemma-3-12b-it-qat-GGUF:Q4_K_MTo set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-llama-server
python -m venv venv
source venv/bin/activateNow install the dependencies and test dependencies:
python -m pip install -e '.[test]'To run the tests:
python -m pytest