Skip to content

Commit 3c81a8b

Browse files
committed
x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
1 parent d64e92c commit 3c81a8b

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

skythought/evals/README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ We've noticed that it can be hard to reproduce results in reasoning benchmarks.
117117
We recommend to run evaluation benchmarks at full precision, i.e float32 to avoid this. In full-precision, evaluation results should be robust to changes in batch size, tensor parallel size, version differences, etc.
118118

119119

120-
## Key Concepts Guide
120+
## Key Concepts
121121

122122
### Tasks
123123

@@ -139,7 +139,7 @@ To add a new task `mytask`:
139139

140140
A Model consists of the model ID and templating configuration. This configuration optionally contains the system prompt and an assistant prefill message. Different reasoning models use their own system prompt, and some perform best when the response is prefilled with special tokens.
141141

142-
We store a list of system prompt templates as well as pre-configured models [here](./models/model_configs.yaml).
142+
We store our pre-configured models as well as a list of system prompt templates [here](./models/model_configs.yaml).
143143

144144
### Backend
145145

@@ -149,4 +149,8 @@ The Backend is concerned with how the LLM instance is created and queried. For f
149149

150150
The Backend also consists of configuration at instantiation (ex; the data type for the model), along with sampling parameters during generation (temperature, max tokens, etc).
151151

152-
152+
During evaluation, the above tie in together and the flow is as follows:
153+
1. Load dataset and create conversations based on the Task and Model specified by the user
154+
2. Generate model responses from the Backend based on the provided sampling parameters
155+
3. Score model responses based on the Task
156+
4. Output final results

0 commit comments

Comments
 (0)