[RFC Request] How to do eval?

HF Smol models use https://github.com/huggingface/lighteval
Some internal tooling uses https://github.com/UKGovernmentBEIS/inspect_ai

Tracking loss functions and rewards are not enough. Ideally, we should run the model on eval datasets and check results every N steps.

e.g. lets say your training data is wikipedia, but your eval is "humanity's last exam". Every N steps we want to check the score on "humanity's last exam".



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC Request] How to do eval? #547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC Request] How to do eval? #547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions