-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Labels
help wantedExtra attention is neededExtra attention is needed
Description
Having some automated tests for tool metadata (tool name and parameter names/descriptions) quality, while not foolproof, would make it much easier to confidently make changes to existing servers without worrying about regressing existing use cases. These tests could catch things like a new tool having a description that collides with another tool and confuses LLMs. We should be able to write some example tests using e.g. Mosaic AI Agent Evaluation or open source eval frameworks
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed