Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/development/evaluations/history/.nav.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sort:
direction: desc
nav:
- index.md
- Weekly: weekly/
- Special: special/
- "*"
19 changes: 5 additions & 14 deletions docs/development/evaluations/history/index.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,9 @@
# Historical Evaluation Results

## Weekly Runs
Browse through our past benchmark runs to track performance trends over time.

Weekly benchmark runs with a standard set of models.
## Weekly Results
Regular weekly benchmark runs that track model performance over time.

See the **Weekly** section in the navigation sidebar for all weekly benchmark results.

## Special Benchmark Runs

One-off benchmark runs for specific purposes such as:

- Comparing self-hosted models
- Testing new model versions
- Performance analysis for specific scenarios
- Custom model comparisons

See the **Special** section in the navigation sidebar for all special benchmark runs.
## Extended Comparisons
Special benchmark runs comparing multiple models and configurations.
5 changes: 0 additions & 5 deletions docs/development/evaluations/history/special/.nav.yml

This file was deleted.

7 changes: 0 additions & 7 deletions docs/development/evaluations/history/special/index.md

This file was deleted.

5 changes: 0 additions & 5 deletions docs/development/evaluations/history/weekly/.nav.yml

This file was deleted.

7 changes: 0 additions & 7 deletions docs/development/evaluations/history/weekly/index.md

This file was deleted.

6 changes: 3 additions & 3 deletions run_benchmarks_local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -154,10 +154,10 @@ if [ -f "scripts/generate_eval_report.py" ]; then
--models "$MODELS"
echo "✅ Report generated: docs/development/evaluations/latest-results.md"

# Also generate timestamped version for history (always in weekly/)
mkdir -p docs/development/evaluations/history/weekly
# Also generate timestamped version for history
mkdir -p docs/development/evaluations/history
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
HISTORY_FILE="docs/development/evaluations/history/weekly/results_${TIMESTAMP}.md"
HISTORY_FILE="docs/development/evaluations/history/results_${TIMESTAMP}.md"
poetry run python scripts/generate_eval_report.py \
--json-file eval_results.json \
--output-file "$HISTORY_FILE" \
Expand Down
Loading