Llama ParamPal β a community-driven digest to help you find the optimal recommended model parameters for running local LLMs using llama.cpp.
Finding the recommended sampling parameters to run llms can sometimes be a cumbersome and time-consuming process. This project aims to make this whole process a little bit easier to:
- Avoid guesswork when running llms.
- Contribute references and links to the models recommended parameter documentations.
The project consists of the models.json file that serves as the source of knowledge and a frontend that is available under https://llama-parampal.codecut.de/ which can be used to quickly search for models in this json.
-
Fork this repo
-
Open the
models.jsonfile -
Add your model or a profile under an existing one
-
Include:
- A descriptive name for the profile
- the llama.cpp CLI sampling parameters
- At least one valid reference to a documentation of the model creators where those settings are documented.
-
Validate the JSON:
cd validation npm install npm run validate -
Submit a Pull Request - we'll review and merge!
π‘ Make sure your JSON is valid and follows the existing structure. When in doubt, use current entries as examples.
- Make sure you define the max --ctx-size that is defined in the gguf headers as context_length of the model you are referencing. ( You can look it up in the huggingface gguf metadata viewer panel. )
- Don't add any hardware dependend parameters like ngl, sm and such. What this json is trying to accomplish is to document the sampling parameters.
Open an issue or join the discussion at https://github.com/kseyhan/llama-param-pal.
llama.cpp: An Llm inference engine in pure C/C++
MIT β free to use, improve, and share.