Skip to content

Conversation

24anisha
Copy link
Contributor

@24anisha 24anisha commented Sep 25, 2025

Currently, we have a default toolset of 46 tools. Many of these tools are hyper-specified and often unnecessary for the task at hand. Their existence in the toolset ends up yielding worse behavior for many models, such as Sonnet 3.7 and GPT 4.1. As such, we're running an experiment to leverage the existing logic for MCP server toolsets to create built in toolsets from the default tools and only exposing a handful of tools to the user directly.

GPT 4.1 Pass Rate

Toolset Benchmark Pass Rate (%)
small MFA 19.4
full MFA 13.4
small SWEBench C# 8.1
full SWEBench C# 7.9

GPT 5 Pass Rate

model metric toolset run id %
gpt5 swec# full 18232124031 38.6
gpt5 swec# small 18232409805 42.6
gpt5 swelancer full 18232152700 41.8
gpt5 swelancer small 18232428869 43.9
gpt5 mfa full 18232085552 32.8
gpt5 mfa small 18232361377 28.3

Sonnet 3.7 Success = True for Both

Case Name Small Toolset  Steps Full Toolset Steps
gitmoji-cli-1248 7 18
isomorphic-git-1493 18 22
Luxon-1173 13 28
Prom-client-146 7 19

@24anisha
Copy link
Contributor Author

24anisha commented Oct 3, 2025

Questions/Action items remaining:

  • Automatically add new tools to the built-in toolset groups (so that new tools aren't automatically considered part of the default toolset)
  • Group the default tools without needing to alter the max toolset
    • BUG: only works for gpt-4.1 and gpt-5 because the max # tools is hardcoded to 28. Need to make it dynamic
  • Only apply this alteration for gpt models -- how to pass in the model family without changing the whole process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant