You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## What does this PR do?
This PR gives the prompt building logic in lighteval a much-needed spring cleaning
The main goal: ditch legacy bloat, make things less painful for users and contributors, and unlock support for more complex benchmarks 🔥
### Highlights
- **Prompt Manager Overhaul:** Each model now owns its own PromptManager instance, with custom params for every flavor of prompt (multimodal, API, multiturn, you name it).
- **system-prompt**: now part of the model config
- **use-chat-template**: now part of model config
- **Metrics Slimdown:** Metrics now only care about `samplingMethod` (generative or loglikelihood). Say goodbye to `use_case` and all those old request types.
- **Request Layer Gone:** Models get the raw `Doc` directly -—no more unnecessary `request` wrappers that were bloating the code.
- **Unified ModelResponse:** All models return a single `ModelResponse` type, whether generative or loglikelihood. This means simpler logging and metric computation.
- **Consistent Metric Signatures:** Every metric now uses the same function signature: `compute(doc: Doc, model_response: ModelResponse)`.
- **Standardized Details:** Each sample’s details now always include three fields: doc, metric, and model_response.
- **Generative Metrics Unified:** All generative metrics now work the same way. If users want greedy generation, they need to set temperature to 0. **Exception will be raised if the user tries to run a sampling metric with temp = 0**
- **Removed Loglikelihood Single Token:** bloated and almost not used
- **Tests:** All tests pass, and no changes were needed to expected values.
### Why?
- Less code, fewer headaches.
- Easier to add new benchmarks (including weird and wonderful ones).
- More user-friendly inspection tools.
- A single, unified way to handle prompts, responses, and metrics.
---------
Co-authored-by: Clémentine Fourrier <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: [email protected] <[email protected]>
0 commit comments