|
| 1 | +# Copilot Instructions for Energy Leaderboard Runner |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +This project measures real-world energy consumption of local LLMs. It has two main components: |
| 6 | +1. **Python CLI** (`src/`) - Runs benchmarks and collects energy metrics from hardware sensors |
| 7 | +2. **React Web App** (`energy-leaderboard-web/`) - Displays crowdsourced benchmark results |
| 8 | + |
| 9 | +## Architecture & Key Patterns |
| 10 | + |
| 11 | +### Energy Meter System (Plugin Architecture) |
| 12 | +All energy meters inherit from `EnergyMeter` base class in [src/energy_meter/base.py](src/energy_meter/base.py): |
| 13 | +- `is_available()` → Platform detection |
| 14 | +- `start()` / `stop()` → Measurement lifecycle returning `energy_wh_raw`, `duration_s`, `sampling_ms` |
| 15 | + |
| 16 | +Platform detection priority in [src/energy_meter/integrator.py](src/energy_meter/integrator.py): |
| 17 | +- macOS: `powermetrics` (requires sudo) |
| 18 | +- Linux: NVML (NVIDIA) → ROCm (AMD) → RAPL (CPU) |
| 19 | + |
| 20 | +### LLM Integrations (Plugin Architecture) |
| 21 | +Implement `LlmRunner` from [src/llm_integrations/base.py](src/llm_integrations/base.py): |
| 22 | +- `check_connection()` → Validate endpoint |
| 23 | +- `generate()` → Returns `(text, tokens_prompt, tokens_completion, response_time_s)` |
| 24 | + |
| 25 | +Supported: Ollama (`ollama_client.py`), OpenAI-compatible (`openai_client.py`) |
| 26 | + |
| 27 | +### Configuration |
| 28 | +Environment-based via [src/config.py](src/config.py): |
| 29 | +- `OLLAMA_HOST` (default: `http://localhost:11434`) |
| 30 | +- `CO2_INTENSITY_G_KWH` (default: 350.0 - EU average) |
| 31 | +- `SAMPLING_INTERVAL_MS` (default: 100) |
| 32 | + |
| 33 | +## Commands & Workflows |
| 34 | + |
| 35 | +```bash |
| 36 | +# Install dependencies |
| 37 | +pip install -r requirements.txt |
| 38 | + |
| 39 | +# Run single benchmark (requires Ollama running) |
| 40 | +python src/main.py run-test --model llama3:latest --test-set easy |
| 41 | + |
| 42 | +# Run all test sets (easy, medium, hard, mixed) - preferred for contributions |
| 43 | +python run_all_tests.py --model llama3:latest |
| 44 | + |
| 45 | +# OpenAI-compatible provider |
| 46 | +python run_all_tests.py --model gpt-4 --provider openai --base-url https://api.example.com |
| 47 | + |
| 48 | +# Docker (Linux with GPU) |
| 49 | +docker build -t energy-leaderboard-runner . |
| 50 | +docker run --rm --gpus all -v $(pwd)/results:/app/results \ |
| 51 | + -e OLLAMA_HOST=http://172.17.0.1:11434 energy-leaderboard-runner \ |
| 52 | + run_all_tests.py --model llama3:latest |
| 53 | +``` |
| 54 | + |
| 55 | +## Test Sets |
| 56 | + |
| 57 | +Located in [src/data/testsets/](src/data/testsets/). Structure: |
| 58 | +```json |
| 59 | +{ |
| 60 | + "id": "ts1", |
| 61 | + "name": "...", |
| 62 | + "goal": "...", |
| 63 | + "questions": [{ "id": "...", "prompt": "...", "difficulty": "easy" }] |
| 64 | +} |
| 65 | +``` |
| 66 | +Reference testset by name without prefix: `--test-set easy` (resolves to `testset_easy.json`) |
| 67 | + |
| 68 | +## Output Schema |
| 69 | + |
| 70 | +Results validated against [src/data/metrics_schema.json](src/data/metrics_schema.json). Key metrics: |
| 71 | +- `energy_wh_raw` / `energy_wh_net` - Energy consumption |
| 72 | +- `wh_per_1k_tokens` - Efficiency metric |
| 73 | +- `g_co2` - Calculated from `CO2_INTENSITY_G_KWH` |
| 74 | + |
| 75 | +Output files: `results/output_{model}_{testset}_{date}.json` |
| 76 | + |
| 77 | +## Web Frontend (`energy-leaderboard-web/`) |
| 78 | + |
| 79 | +React + Vite + Tailwind. Data lives in `public/data/*.json`. |
| 80 | + |
| 81 | +```bash |
| 82 | +cd energy-leaderboard-web |
| 83 | +npm install && npm run dev # Development |
| 84 | +npm run build # Production build |
| 85 | +``` |
| 86 | + |
| 87 | +## Contributing Benchmarks |
| 88 | + |
| 89 | +1. Run `python run_all_tests.py --model <model>` |
| 90 | +2. Copy `results/output_*.json` → `energy-leaderboard-web/public/data/` |
| 91 | +3. Submit PR with new JSON files |
| 92 | + |
| 93 | +## Code Conventions |
| 94 | + |
| 95 | +- CLI uses Typer with Rich for console output |
| 96 | +- Dual import pattern in main.py supports both module and direct execution |
| 97 | +- Abstract base classes define interfaces; implementations are in same directory |
| 98 | +- All file paths use `pathlib.Path` |
0 commit comments