@@ -13,16 +13,16 @@ MCP-TE Benchmark (where "TE" stands for "Task Efficiency") is designed to measur
1313
1414* Environment: Twilio (MCP Server), Cline (MCP Client), Model: claude-3.7-sonnet*
1515
16- | Metric | Control | MCP | Change |
17- | :--------------------- | :--------- | :--------- | :----- |
18- | Average Duration (s) | 62.5 | 49.7 | -20.5 % |
19- | Average API Calls | 10.3 | 8.3 | -19.3 % |
20- | Average Interactions | 1.1 | 1.0 | -3.3 % |
21- | Average Tokens | 2286.1 | 2141.4 | -6.3 % |
22- | Average Cache Reads | 191539.5 | 246152.5 | +28.5 % |
23- | Average Cache Writes | 11043.5 | 16973.9 | +53.7 % |
24- | Average Cost ($) | 0.1 | 0.2 | +27.5 % |
25- | Success Rate | 92.3 % | 100.0% | +8.3 % |
16+ | Metric | Control | MCP | Change |
17+ | :--------------------- | :--------- | :--------- | :----- |
18+ | Average Duration (s) | 62.54 | 49.68 | -20.56 % |
19+ | Average API Calls | 10.27 | 8.29 | -19.26 % |
20+ | Average Interactions | 1.08 | 1.04 | -3.27 % |
21+ | Average Tokens | 2286.12 | 2141.38 | -6.33 % |
22+ | Average Cache Reads | 191539.50 | 246152.46 | +28.51 % |
23+ | Average Cache Writes | 11043.46 | 16973.88 | +53.70 % |
24+ | Average Cost ($) | 0.13 | 0.17 | +27.55 % |
25+ | Success Rate | 92.31 % | 100.0% | +8.33 % |
2626
2727* Note: Calculations based on data in ` metrics/summary.json ` .*
2828
@@ -40,24 +40,42 @@ MCP-TE Benchmark (where "TE" stands for "Task Efficiency") is designed to measur
4040
4141#### Task 1: Purchase a Canadian Phone Number
4242
43- | Mode | Duration (s) | API Calls | Interactions | Success Rate |
44- | :------ | :----------- | :-------- | :----------- | :----------- |
45- | Control | 79.4 | 12.8 | 1.2 | 100.0% |
46- | MCP | 62.3 | 9.6 | 1.1 | 100.0% |
43+ | Metric | Control | MCP | Change |
44+ | :--------------------- | :--------- | :--------- | :------- |
45+ | Duration (s) | 79.41 | 62.27 | -21.57% |
46+ | API Calls | 12.78 | 9.63 | -24.67% |
47+ | Interactions | 1.22 | 1.13 | -7.95% |
48+ | Tokens | 2359.33 | 2659.88 | +12.74% |
49+ | Cache Reads | 262556.11 | 281086.13 | +7.06% |
50+ | Cache Writes | 17196.33 | 25627.63 | +49.03% |
51+ | Cost ($) | 0.18 | 0.22 | +23.50% |
52+ | Success Rate | 100.00% | 100.00% | 0.00% |
4753
4854#### Task 2: Create a Task Router Activity
4955
50- | Mode | Duration (s) | API Calls | Interactions | Success Rate |
51- | :------ | :----------- | :-------- | :----------- | :----------- |
52- | Control | 46.4 | 8.4 | 1.0 | 77.8% |
53- | MCP | 30.7 | 5.9 | 1.0 | 100.0% |
56+ | Metric | Control | MCP | Change |
57+ | :--------------------- | :--------- | :--------- | :------- |
58+ | Duration (s) | 46.37 | 30.71 | -33.77% |
59+ | API Calls | 8.44 | 5.88 | -30.43% |
60+ | Interactions | 1.00 | 1.00 | 0.00% |
61+ | Tokens | 2058.89 | 1306.63 | -36.54% |
62+ | Cache Reads | 144718.44 | 164311.50 | +13.54% |
63+ | Cache Writes | 6864.44 | 11219.13 | +63.44% |
64+ | Cost ($) | 0.10 | 0.11 | +11.09% |
65+ | Success Rate | 77.78% | 100.00% | +28.57% |
5466
5567#### Task 3: Create a Queue with Task Filter
5668
57- | Mode | Duration (s) | API Calls | Interactions | Success Rate |
58- | :------ | :----------- | :-------- | :----------- | :----------- |
59- | Control | 61.8 | 9.5 | 1.0 | 100.0% |
60- | MCP | 56.1 | 9.4 | 1.0 | 100.0% |
69+ | Metric | Control | MCP | Change |
70+ | :--------------------- | :--------- | :--------- | :------- |
71+ | Duration (s) | 61.77 | 56.07 | -9.23% |
72+ | API Calls | 9.50 | 9.38 | -1.32% |
73+ | Interactions | 1.00 | 1.00 | 0.00% |
74+ | Tokens | 2459.38 | 2457.63 | -0.07% |
75+ | Cache Reads | 164319.50 | 293059.75 | +78.35% |
76+ | Cache Writes | 8822.88 | 14074.88 | +59.53% |
77+ | Cost ($) | 0.12 | 0.18 | +49.06% |
78+ | Success Rate | 100.00% | 100.00% | 0.00% |
6179
6280## Benchmark Design & Metrics
6381
@@ -191,7 +209,8 @@ The benchmark focuses on these key insights:
1912091. **Time Efficiency:** Comparing the time it takes to complete tasks using MCP vs. traditional methods
1922102. **API Efficiency:** Measuring the reduction in API calls when using MCP
1932113. **Interaction Efficiency:** Evaluating if MCP reduces the number of interactions needed to complete tasks
194- 4. **Success Rate:** Determining if MCP improves the reliability of task completion
212+ 4. **Cost Efficiency** Evalutating if the added MCP context has an impact on Token Costs
213+ 5. **Success Rate:** Determining if MCP improves the reliability of task completion
195214
196215Negative percentage changes in duration, API calls, and interactions indicate improvements, while positive changes in success rate indicate improvements.
197216
0 commit comments