Skip to content

Conversation

24anisha
Copy link
Contributor

@24anisha 24anisha commented Sep 25, 2025

Currently, we have a default toolset of 46 tools. Many of these tools are hyper-specified and often unnecessary for the task at hand. Their existence in the toolset ends up yielding worse behavior for many models, such as Sonnet 3.7 and GPT 4.1. As such, we're running an experiment to leverage the existing logic for MCP server toolsets to create built in toolsets from the default tools and only exposing a handful of tools to the user directly.

GPT 4.1 Pass Rate

Toolset Benchmark Pass Rate (%)
small MFA 19.4
full MFA 13.4
small SWEBench C# 8.1
full SWEBench C# 7.9

GPT 5 Pass Rate

model metric toolset run id %
gpt5 swec# full 18232124031 38.6
gpt5 swec# small 18232409805 42.6
gpt5 swelancer full 18232152700 41.8
gpt5 swelancer small 18232428869 43.9
gpt5 mfa full 18232085552 32.8
gpt5 mfa small 18232361377 28.3

Sonnet 3.7 Success = True for Both

Case Name Small Toolset  Steps Full Toolset Steps
gitmoji-cli-1248 7 18
isomorphic-git-1493 18 22
Luxon-1173 13 28
Prom-client-146 7 19

@24anisha
Copy link
Contributor Author

24anisha commented Oct 3, 2025

Questions/Action items remaining:

  • Automatically add new tools to the built-in toolset groups (so that new tools aren't automatically considered part of the default toolset)
  • Group the default tools without needing to alter the max toolset
    • BUG: only works for gpt-4.1 and gpt-5 because the max # tools is hardcoded to 28. Need to make it dynamic
    • ISSUE: in toolGrouping.ts, isEnabled() is meant to be a fast check. So, we don't pass the query in, which means we don't know the model family. If we do that, do non-GPT models get sent to the more costly compute step every query? How to manage?
  • Only apply this alteration for gpt models -- how to pass in the model family without changing the whole process?

// Enable if we could potentially trigger built-in grouping (when GPT model is used)
const defaultToolGroupingEnabled = this._configurationService.getExperimentBasedConfig(ConfigKey.Internal.DefaultToolsGrouped, this._experimentationService);
const couldTriggerBuiltInGrouping = this._tools.length > Constant.START_BUILTIN_GROUPING_AFTER_TOOL_COUNT && defaultToolGroupingEnabled;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check everything we need. We also need to confirm that the model is gpt-4.1 or gpt-5, but this is supposed to be a lightweight, quick function to call.
Options:

  • pass the endpoint into this function so we can check to make sure it's the correct model before going to virtual tool grouping
  • let it go to virtual tool grouping every time (since START_BUILTIN_GROUPING_AFTER_TOOL_COUNT is 20) and get stopped from grouping there by the endpoint check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant