Skip to content

Commit 468fb15

Browse files
committed
fix deploy error
Signed-off-by: JaredforReal <[email protected]>
1 parent 658e294 commit 468fb15

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

website/docs/training/model_performance_eval.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ python mmlu_pro_vllm_eval.py \
8484

8585
### What it outputs per model:
8686

87-
- **results/<model_name>_(direct|cot)/**
87+
- **results/Model_Name_(direct|cot)/**
8888
- **detailed_results.csv**: one row per question with is_correct and category
8989
- **analysis.json**: overall_accuracy, category_accuracy map, avg_response_time, counts
9090
- **summary.json**: condensed metrics
@@ -113,7 +113,7 @@ python arc_challenge_vllm_eval.py \
113113

114114
### What it outputs per model:
115115

116-
- **results/<model_name>_(direct|cot)/**
116+
- **results/Model_Name_(direct|cot)/**
117117
- **detailed_results.csv**: one row per question with is_correct and category
118118
- **analysis.json**: overall_accuracy, avg_response_time
119119
- **summary.json**: condensed metrics
@@ -199,7 +199,7 @@ python src/training/model_eval/result_to_config.py \
199199
- Constructs a new config:
200200
- default_model: the best average performer across categories
201201
- categories: For each category present in results, ranks models by accuracy:
202-
- category.model_scores = [{model: "<name>", score: <float>}, ...], highest first
202+
- category.model_scores = `[{ model: "Model_Name", score: 0.87 }, ...]`, highest first
203203
- category reasoning settings: auto-filled from a built-in mapping (math, physics, chemistry, CS, engineering -> high reasoning; others default to low/medium; you can adjust after generation)
204204
- Leaves out any special “auto” placeholder models if present
205205

website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ const sidebars = {
3838
label: 'Model Training',
3939
items: [
4040
'training/training-overview',
41+
'training/model_performance_eval',
4142
],
4243
},
4344
{

0 commit comments

Comments
 (0)