-
Notifications
You must be signed in to change notification settings - Fork 69
Open
Description
HF Smol models use https://github.com/huggingface/lighteval
Some internal tooling uses https://github.com/UKGovernmentBEIS/inspect_ai
Tracking loss functions and rewards are not enough. Ideally, we should run the model on eval datasets and check results every N steps.
e.g. lets say your training data is wikipedia, but your eval is "humanity's last exam". Every N steps we want to check the score on "humanity's last exam".
Metadata
Metadata
Assignees
Labels
No labels