File tree Expand file tree Collapse file tree 1 file changed +2
-1
lines changed
agents_mcp_usage/multi_mcp/eval_multi_mcp Expand file tree Collapse file tree 1 file changed +2
-1
lines changed Original file line number Diff line number Diff line change 25
25
"\n \n Merbench tests this ability by providing an LLM Agent access to an MCP server that both validates "
26
26
"and provides error messages to guide correction of syntax. There are three different difficulty levels (test cases), "
27
27
"and the LLM is given a fixed number of attempts to fix the diagram, if this is exceeded, the test case is considered failed. "
28
- "\n \n This leaderboard shows the average success rate across all selected models and difficulty levels."
28
+ "\n \n **Performance is a measure of both tool usage, and Mermaid syntax understanding.**"
29
+ "\n \n The leaderboard shows the average success rate across all selected models and difficulty levels over *n runs*."
29
30
),
30
31
"icon" : "🧜♀️" , # Emoji for the browser tab
31
32
# --- Primary Metric Configuration ---
You can’t perform that action at this time.
0 commit comments