brute-llama (beta, meta)

brute-llama is a handy performance testing and analysis tool, particularly focused on llama.cpp llama-server configuration. It provides a web-based interface to configure, run, and visualize performance tests for various model configurations.

Features

Web-based Interface: Interactive dashboard built with Dash and Dash Bootstrap Components
Configurable Test Parameters: Define server and measurement parameters with flexible templates
Performance Visualization: Real-time plotting of performance metrics using Plotly
Configuration Management: Save and load test configurations as YAML files

Installation

Clone the repository:

git clone https://github.com/crashr/brute-llama.git
cd brute-llama

Install dependencies (venv recommended):
```
pip install -r requirements.txt
```
Create required directories:
```
mkdir -p configs data
```

Configuration

The application uses YAML configuration files stored in the configs/ directory. Each configuration file contains:

server_template: Command template to start the model server
measure_template: Command template to measure performance
server_url: URL for health checks
server_params: Parameters for server configuration
measure_params: Parameters for measurement configuration

Example configuration (configs/example.yaml):

server_template: /usr/local/bin/llama-server --host 0.0.0.0 --port 8048 -fa -sm none --no-mmap -ngl 99 -m /data/disk1/models/gemma-3/gemma-3-270m-it-Q8_0.gguf -mg 0 -ctk q8_0 -ctv q8_0 --jinja -ts {{ts}}
measure_template: "curl -s http://192.168.178.56:8048/v1/chat/completions -H \"Content-Type: application/json\" -H \"Authorization: Bearer none\" -d '{\"model\": \"anymodel\", \"messages\": [{\"role\": \"system\", \"content\": \"give short answers.\"}, {\"role\": \"user\", \"content\": \"Hi.\"}] }' | jq '.timings.predicted_per_second'"
server_url: 192.168.178.56:8048
debug_mode: []
server_params:
  - id: param1
    name: ts
    values: "1,1,1,1,1;1,1,1,1,0"
measure_params:
  - id: param2
    name: run
    values: "1;2;3;4;5"

Usage

Start the application:
```
python brute-llama.py
```
Access the web interface: Open your browser and navigate to http://localhost:9111
Configure your test:
- Load an existing configuration or create a new one
- Define server and measurement parameters
- Set the server URL and debug mode
Run tests:
- Click "START TEST RUN" to begin performance testing
- Monitor real-time logs and performance metrics
- Use "CANCEL TEST RUN" to stop ongoing tests
Save configurations:
- Enter a name and click "Save" to store your configuration
- Load saved configurations from the dropdown menu

Screenshots

Configuration section of the brute-llama dashboard

Example of the brute-llama dashboard showing performance metrics

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
configs		configs
LICENSE		LICENSE
README.md		README.md
brute-llama.py		brute-llama.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

brute-llama (beta, meta)

Features

Installation

Configuration

Usage

Screenshots

License

About

Uh oh!

Releases

Packages

Languages

License

crashr/brute-llama

Folders and files

Latest commit

History

Repository files navigation

brute-llama (beta, meta)

Features

Installation

Configuration

Usage

Screenshots

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages