You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Log model and usage stats in record.sampling (#1449)
It's often useful to know the token expenditure of running an eval,
especially as the number of evals in this repo grows. Example [feature
request](#1350), and we also rely
on this e.g.
[here](https://github.com/openai/evals/tree/main/evals/elsuite/bluff#token-estimates).
Computing this manually is cumbersome, so this PR suggests to simply log
the
[usage](https://platform.openai.com/docs/api-reference/chat/object#chat/object-usage)
receipts (for token usage) of each API call in `record.sampling`. This
makes it easy for one to sum up the token cost of an eval given a
logfile of the run.
Here is an example of a resulting `sampling` log line after this change
(we add the `data.model` and `data.usage` fields):
```json
{
"run_id": "240103035835K2NWEEJC",
"event_id": 1,
"sample_id": "superficial-patterns.dev.8",
"type": "sampling",
"data": {
"prompt": [
{
"role": "system",
"content": "If the red key goes to the pink door, and the blue key goes to the green door, but you paint the green door to be the color pink, and the pink door to be the color red, and the red key yellow, based on the new colors of everything, which keys go to what doors?"
}
],
"sampled": [
"Based on the new colors, the yellow key goes to the pink door (previously red), and the blue key goes to the red door (previously pink)."
],
"model": "gpt-3.5-turbo-0613", # NEW
"usage": { # NEW
"completion_tokens": 33,
"prompt_tokens": 70,
"total_tokens": 103
}
},
"created_by": "",
"created_at": "2024-01-03 03:58:37.466772+00:00"
}
```
0 commit comments