GSoC 2026 – Interest in Behavioral Evaluation Test Framework #22033
HiradFakouri
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I'm Hirad, a Computing Science student at the University of Glasgow. I'm interested in contributing to the Behavioural Evaluation Test Framework project for GSoC 2026 and wanted to introduce myself to the community.
A bit about my background: I work on path optimisation and benchmarking systems for my university's Formula Student autonomous racing team, primarily in Python. I also have solid Node js/TypeScript experience and have built projects using JavaScript frameworks. Additionally, I've completed Stanford's Introduction to Statistics, which I think is directly relevant to the scoring, metric design, and regression detection aspects of this project, particularly around handling the non-deterministic nature of LLM outputs.
I've been exploring the evals directory and the evalTest harness in the codebase, and I find the distinction between ALWAYS_PASSES and USUALLY_PASSES behaviours really interesting from a statistical reliability standpoint. I'm planning to open a PR adding new behavioural eval test cases as my first contribution, and I'd love guidance on what areas of eval coverage are most needed right now, whether that's specific tool behaviours, edge cases, or something else entirely.
Happy to be here and looking forward to contributing to the project!
Beta Was this translation helpful? Give feedback.
All reactions