Skip to content

Commit 6a83809

Browse files
Sela TachnaiSela Tachnai
authored andcommitted
Front end leaderboard
1 parent d556d25 commit 6a83809

File tree

4,380 files changed

+1237582
-5
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

4,380 files changed

+1237582
-5
lines changed

.github/workflows/deploy.yml

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
name: Deploy to GitHub Pages
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
paths:
8+
- 'energy-leaderboard-web/**'
9+
- '.github/workflows/deploy.yml'
10+
workflow_dispatch:
11+
12+
permissions:
13+
contents: read
14+
pages: write
15+
id-token: write
16+
17+
concurrency:
18+
group: 'pages'
19+
cancel-in-progress: true
20+
21+
jobs:
22+
build:
23+
runs-on: ubuntu-latest
24+
defaults:
25+
run:
26+
working-directory: energy-leaderboard-web
27+
steps:
28+
- name: Checkout
29+
uses: actions/checkout@v4
30+
31+
- name: Setup Node.js
32+
uses: actions/setup-node@v4
33+
with:
34+
node-version: '20'
35+
cache: 'npm'
36+
cache-dependency-path: energy-leaderboard-web/package-lock.json
37+
38+
- name: Install dependencies
39+
run: npm ci
40+
41+
- name: Build
42+
run: npm run build
43+
44+
- name: Setup Pages
45+
uses: actions/configure-pages@v4
46+
47+
- name: Upload artifact
48+
uses: actions/upload-pages-artifact@v3
49+
with:
50+
path: energy-leaderboard-web/dist
51+
52+
deploy:
53+
environment:
54+
name: github-pages
55+
url: ${{ steps.deployment.outputs.page_url }}
56+
runs-on: ubuntu-latest
57+
needs: build
58+
steps:
59+
- name: Deploy to GitHub Pages
60+
id: deployment
61+
uses: actions/deploy-pages@v4
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Constitution: The Guiding Principles (Leaderboard)
2+
3+
### 1. Transparency & Honesty
4+
The leaderboard must clearly distinguish between *measured* data (hardware sensors) and *estimated* data.
5+
* **Action:** Every row must have a visible "Measured" or "Estimated" badge.
6+
* **Action:** Rows without a power trace (verification) must be marked.
7+
8+
### 2. User-Centric Design
9+
The data is complex (Wh/1k tokens, Latency, Joules), but the UI must be simple.
10+
* **Action:** Default view shows only the most important metrics (Model, Efficiency, CO2).
11+
* **Action:** Provide "Info tooltips" explaining what "Wh/1k tokens" means.
12+
13+
### 3. Static First
14+
To ensure the project is free to host and easy to fork, it must not require a dynamic backend server.
15+
* **Action:** Build a Static Single Page App (SPA).
16+
* **Action:** Data ingestion happens at build-time or run-time from static JSON files.
17+
18+
### 4. Mobile Responsiveness
19+
The leaderboard must be readable on mobile devices.
20+
* **Action:** The main table must support horizontal scrolling or a card-view on small screens.

AI_DOCS/leaderboard_plan.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Plan: Architecture & File Structure
2+
3+
### File Structure
4+
energy-leaderboard-web/ ├── public/ │ └── data/ # Drop result.json files here! │ ├── run_apple_m1.json │ └── run_nvidia_t4.json ├── src/ │ ├── components/ │ │ ├── LeaderboardTable.tsx # The main component │ │ ├── FilterBar.tsx # Search & Dropdowns │ │ └── MetricBadge.tsx # For "Measured" vs "Estimated" │ ├── lib/ │ │ ├── data-loader.ts # Fetches/Imports the JSONs │ │ └── types.ts # TypeScript interfaces for the JSON schema │ ├── App.tsx │ └── main.tsx ├── index.html ├── package.json ├── tailwind.config.js └── vite.config.ts
5+
6+
7+
### Core Logic (Data Loader)
8+
Since we cannot "list files" in a static browser environment easily without a server index, we will use a clever Vite feature or a simple manifest approach.
9+
* **Approach:** We will use `import.meta.glob('/public/data/*.json')` provided by Vite to load all JSON files at build time/runtime. This makes adding data as simple as "drag and drop file into folder".
10+
11+
### Deployment Strategy
12+
We will create a `.github/workflows/deploy.yml` that:
13+
1. Triggers on push to `main`.
14+
2. Installs dependencies (`npm ci`).
15+
3. Builds the site (`npm run build`).
16+
4. Deploys the `dist/` folder to the `gh-pages` branch.

AI_DOCS/leaderboard_specify.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Specifications: Leaderboard Requirements
2+
3+
### Functional Requirements
4+
5+
**F1: Data Ingestion**
6+
* The app MUST automatically load multiple `result.json` files from the `public/data/` directory.
7+
* It MUST parse the standard JSON schema (defined in the Runner project) including fields like `energy_wh_net`, `tokens_total`, `g_co2`.
8+
9+
**F2: The Leaderboard Table**
10+
* The app MUST display a sortable table with the following columns (based on Eindproductplan):
11+
1. **Rank** (Auto-calculated based on efficiency)
12+
2. **Model** (e.g., "llama3:8b")
13+
3. **Efficiency** (Wh per 1k tokens) - *Primary Sort Key*
14+
4. **CO2** (g)
15+
5. **Net Energy** (Wh)
16+
6. **Speed** (Tokens/sec or Duration)
17+
7. **Device** (OS / GPU / CPU)
18+
8. **Method** (Measured/Estimated)
19+
* Users MUST be able to sort by clicking headers.
20+
21+
**F3: Filtering & Search**
22+
* Provide a text search bar for "Model Name".
23+
* Provide dropdown filters for:
24+
* **Device Type** (NVIDIA, Apple, AMD, CPU)
25+
* **Method** (Show All, Only Measured)
26+
27+
**F4: Detail View**
28+
* Clicking a row SHOULD open a modal/expanded view showing full details (Start/End time, specific hardware specs, user comments).
29+
30+
### Non-Functional Requirements
31+
32+
**NF1: Tech Stack:** React (TypeScript), Vite, Tailwind CSS, Shadcn/UI (optional, for nice tables).
33+
**NF2: Hosting:** Must be deployable to GitHub Pages via a provided GitHub Actions workflow.
34+
**NF3: Performance:** Fast loading score (Lighthouse > 90).

AI_DOCS/leaderboard_tasks.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Tasks: Implementation Plan
2+
3+
### Sprint 1: Setup & Scaffolding
4+
**Goal:** A running React app that can read one JSON file.
5+
1. [ ] **Task 1.1:** Initialize Vite project (React + TS) named `energy-leaderboard-web`.
6+
2. [ ] **Task 1.2:** Install Tailwind CSS and configure `tailwind.config.js` (use green/eco colors).
7+
3. [ ] **Task 1.3:** Create `src/lib/types.ts` matching the Runner's JSON output schema.
8+
4. [ ] **Task 1.4:** Create dummy data in `public/data/example_run.json`.
9+
10+
### Sprint 2: Data Ingestion & Logic
11+
**Goal:** Load all JSON files and process them.
12+
5. [ ] **Task 2.1:** Implement `src/lib/data-loader.ts` using `import.meta.glob` to load all JSONs from the data folder.
13+
6. [ ] **Task 2.2:** Create a helper function to calculate "Rank" and normalize data (e.g. handle missing fields).
14+
15+
### Sprint 3: UI Components (The Leaderboard)
16+
**Goal:** A visual table with sorting and filtering.
17+
7. [ ] **Task 3.1:** Build `LeaderboardTable.tsx`. Use standard HTML `<table>` or a library like TanStack Table for sorting.
18+
8. [ ] **Task 3.2:** Implement columns: Rank, Model, Efficiency (Wh/1k), CO2, Hardware.
19+
9. [ ] **Task 3.3:** Add `FilterBar.tsx` for searching models and filtering by "Measured/Estimated".
20+
10. [ ] **Task 3.4:** Style the table. Add "Green Leaf" badges for good scores and "Warning" badges for estimated data.
21+
22+
### Sprint 4: Deployment & Polish
23+
**Goal:** Live on GitHub Pages.
24+
11. [ ] **Task 4.1:** Create `.github/workflows/deploy.yml` for GitHub Pages deployment.
25+
12. [ ] **Task 4.2:** Update `vite.config.ts` with the correct `base` url (usually `/{repo-name}/`).
26+
13. [ ] **Task 4.3:** Write `README.md` explaining how to add a new test result (just upload the JSON!).

README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,8 @@ Options:
106106
-m, --model TEXT Model identifier (e.g., 'llama3:latest') [required]
107107
-t, --test-set TEXT Testset name or stem (e.g., 'easy', 'testset_easy') [default: easy]
108108
-o, --output TEXT Output file path for results [default: results/output.json]
109+
-d, --device-name TEXT Override auto-detected device name
110+
--device-type TEXT Override device type (apple, nvidia, amd, intel, unknown)
109111
--help Show this message and exit
110112

111113
The runner looks for `testset_<name>.json` inside `src/data/testsets/`. Passing `easy`, `testset_easy`, or the exact filename stem all point to the same file.
@@ -216,6 +218,13 @@ Results are exported as JSON and validated against a strict schema:
216218
"region": "unknown",
217219
"notice": null,
218220
"sampling_ms": 100,
221+
"device_name": "Apple M1 Pro Mac (MacBookPro18,3)",
222+
"device_type": "apple",
223+
"os_name": "macOS",
224+
"os_version": "14.2",
225+
"cpu_model": "Apple M1 Pro",
226+
"ram_gb": 16.0,
227+
"chip_architecture": "arm64",
219228
"testset_id": "testset_easy",
220229
"testset_name": "Easy Baseline",
221230
"question_id": "easy-1",
@@ -225,6 +234,21 @@ Results are exported as JSON and validated against a strict schema:
225234
]
226235
```
227236

237+
### Required Device Fields
238+
239+
Every result now includes mandatory device information for accurate hardware comparison:
240+
241+
| Field | Description | Example |
242+
|-------|-------------|----------|
243+
| `device_name` | Human-readable device name | "Apple M1 Pro Mac" |
244+
| `device_type` | Category for filtering | `apple`, `nvidia`, `amd`, `intel`, `unknown` |
245+
| `os_name` | Operating system | "macOS", "Linux", "Windows" |
246+
| `os_version` | OS version string | "14.2" |
247+
248+
Optional device fields: `cpu_model`, `gpu_model`, `ram_gb`, `chip_architecture`.
249+
250+
Device information is **auto-detected** on startup. Use `--device-name` or `--device-type` to override if needed.
251+
228252
Additional metadata fields (e.g., `testset_goal`, `testset_notes`, `question_task_type`, `expected_answer_description`, `max_output_tokens_hint`, `energy_relevance`) are included when present in the source testset and validated via `src/data/metrics_schema.json`.
229253

230254
## Test Sets

RUNBOOK.md

Lines changed: 74 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,11 @@ This document provides detailed, platform-specific installation instructions, tr
99
- [Linux + NVIDIA GPU (NVML)](#linux--nvidia-gpu-nvml)
1010
- [Linux + AMD GPU (ROCm-smi)](#linux--amd-gpu-rocm-smi)
1111
- [Linux + Intel/AMD CPU (RAPL)](#linux--intelamd-cpu-rapl)
12-
2. [Ollama Setup](#ollama-setup)
13-
3. [Docker Troubleshooting](#docker-troubleshooting)
14-
4. [Common Issues](#common-issues)
15-
5. [Advanced Configuration](#advanced-configuration)
12+
2. [Device Detection](#device-detection)
13+
3. [Ollama Setup](#ollama-setup)
14+
4. [Docker Troubleshooting](#docker-troubleshooting)
15+
5. [Common Issues](#common-issues)
16+
6. [Advanced Configuration](#advanced-configuration)
1617

1718
---
1819

@@ -333,6 +334,75 @@ docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
333334

334335
---
335336

337+
## Device Detection
338+
339+
The runner automatically detects your hardware and includes device information in all benchmark results. This enables accurate comparisons across different machines on the Energy Leaderboard.
340+
341+
### Auto-Detection
342+
343+
When you run a benchmark, the runner automatically detects:
344+
345+
```
346+
[yellow]Detecting device information...[/yellow]
347+
[green]✓[/green] Device: Apple M1 Pro Mac (MacBookPro18,3)
348+
Type: apple
349+
OS: macOS 14.2
350+
CPU: Apple M1 Pro
351+
RAM: 16.0 GB
352+
```
353+
354+
### Detected Information
355+
356+
| Field | Description | Auto-Detected On |
357+
|-------|-------------|------------------|
358+
| `device_name` | Human-readable name | All platforms |
359+
| `device_type` | Category (`apple`, `nvidia`, `amd`, `intel`, `unknown`) | All platforms |
360+
| `os_name` | Operating system | All platforms |
361+
| `os_version` | OS version | All platforms |
362+
| `cpu_model` | CPU model string | macOS, Linux, Windows |
363+
| `gpu_model` | GPU model (if applicable) | NVIDIA, AMD, Apple Silicon |
364+
| `ram_gb` | System RAM in GB | All platforms |
365+
| `chip_architecture` | CPU arch (`arm64`, `x86_64`) | All platforms |
366+
367+
### Manual Override
368+
369+
If auto-detection fails or you want a custom device name:
370+
371+
```bash
372+
# Override device name
373+
python src/main.py run-test -m llama3:latest -t easy --device-name "My Gaming PC"
374+
375+
# Override device type
376+
python src/main.py run-test -m llama3:latest -t easy --device-type nvidia
377+
378+
# Both
379+
python src/main.py run-test -m llama3:latest -t easy \
380+
--device-name "Custom RTX 4090 Build" \
381+
--device-type nvidia
382+
```
383+
384+
### Troubleshooting Detection
385+
386+
**Test device detection:**
387+
```bash
388+
python -c "from src.utils.device_info import detect_device_info; info = detect_device_info(); print(info)"
389+
```
390+
391+
**macOS issues:**
392+
- `system_profiler` must be available (built-in on all macOS versions)
393+
- Apple Silicon chips are detected via `sysctl` and `system_profiler`
394+
395+
**Linux issues:**
396+
- GPU detection requires `nvidia-smi` (NVIDIA) or `rocm-smi` (AMD) in PATH
397+
- CPU info is read from `/proc/cpuinfo`
398+
- RAM info is read from `/proc/meminfo`
399+
400+
**Windows issues:**
401+
- Uses `wmic` commands for CPU and RAM detection
402+
- NVIDIA GPU detected via `nvidia-smi`
403+
404+
---
405+
336406
## Ollama Setup
337407

338408
### Installation

0 commit comments

Comments
 (0)