open model proving ground - aka model "gym" #6937
michaelneale
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As we look to help support more open models, adapt to open models (local and remote) there was a need for a quick way to run permutations of tests across configurations of agents, models, variants so we can focus on what can work (and potentially even let goose edit itself to improve, ablate things away to work with tiny models and more).
This repo resulted: https://github.com/michaelneale/open-model-gym
This works by letting you specify versions of goose, extension configurations, model configurations and even run alongside opencode (could add others) to see what works.
This comes with a few baked in scenarios:
This then takes care of tracking the tool calls, success of what happened and caching of results so you can see things side by side. try different combinations all at once, new editors, extensions on or of etc etc (even fine tunes of models). I expect there will be more fleshing out of tasks using tools (the synthetic MCP allows that, but we also want to stress test longer sessions, which are not as easy in a reproducible way).
This hopefully will help goose and other models scale down to open models some of which can be local (increasingly) and hopefully help everyone.
What I would be curious about is if there is interest in running these in a way to share and collect results.
Beta Was this translation helpful? Give feedback.
All reactions