Skip to content

simonw/llm-llama-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-llama-server

PyPI Changelog Tests License

Interact with llama-server models

Installation

Install this plugin in the same environment as LLM.

llm install llm-llama-server

Usage

You'll need to be running a llama-server on port 8080 to use this plugin.

You can brew install llama.cpp to obtain that binary. Then run it like this:

llama-server -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

This loads and serves the unsloth/gemma-3-4b-it-GGUF GGUF version of Gemma 3 4B - a 3.2GB download.

To access regular models from LLM, use the llama-server model:

llm -m llama-server "say hi"

For vision models, use llama-server-vision:

llm -m llama-server-vision describe -a path/to/image.png

For models with tools (which also support vision) use llama-server-tools:

llm -m llama-server-tools -T llm_time 'time?' --td

You'll need to run the llama-server with the --jinja flag in order for this to work:

llama-server --jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

Or for a slightly stronger 7.3GB model:

llama-server --jinja -hf unsloth/gemma-3-12b-it-qat-GGUF:Q4_K_M

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-llama-server
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

python -m pip install -e '.[test]'

To run the tests:

python -m pytest

About

LLM plugin for interacting with llama-server models

Resources

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages