Skip to content

Commit 6e52251

Browse files
committed
updated metrics in readme
1 parent beb9b05 commit 6e52251

File tree

1 file changed

+42
-23
lines changed

1 file changed

+42
-23
lines changed

README.md

Lines changed: 42 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,16 @@ MCP-TE Benchmark (where "TE" stands for "Task Efficiency") is designed to measur
1313

1414
*Environment: Twilio (MCP Server), Cline (MCP Client), Model: claude-3.7-sonnet*
1515

16-
| Metric | Control | MCP | Change |
17-
| :--------------------- | :--------- | :--------- | :----- |
18-
| Average Duration (s) | 62.5 | 49.7 | -20.5% |
19-
| Average API Calls | 10.3 | 8.3 | -19.3% |
20-
| Average Interactions | 1.1 | 1.0 | -3.3% |
21-
| Average Tokens | 2286.1 | 2141.4 | -6.3% |
22-
| Average Cache Reads | 191539.5 | 246152.5 | +28.5% |
23-
| Average Cache Writes | 11043.5 | 16973.9 | +53.7% |
24-
| Average Cost ($) | 0.1 | 0.2 | +27.5% |
25-
| Success Rate | 92.3% | 100.0% | +8.3% |
16+
| Metric | Control | MCP | Change |
17+
| :--------------------- | :--------- | :--------- | :----- |
18+
| Average Duration (s) | 62.54 | 49.68 | -20.56% |
19+
| Average API Calls | 10.27 | 8.29 | -19.26% |
20+
| Average Interactions | 1.08 | 1.04 | -3.27% |
21+
| Average Tokens | 2286.12 | 2141.38 | -6.33% |
22+
| Average Cache Reads | 191539.50 | 246152.46 | +28.51% |
23+
| Average Cache Writes | 11043.46 | 16973.88 | +53.70% |
24+
| Average Cost ($) | 0.13 | 0.17 | +27.55% |
25+
| Success Rate | 92.31% | 100.0% | +8.33% |
2626

2727
*Note: Calculations based on data in `metrics/summary.json`.*
2828

@@ -40,24 +40,42 @@ MCP-TE Benchmark (where "TE" stands for "Task Efficiency") is designed to measur
4040

4141
#### Task 1: Purchase a Canadian Phone Number
4242

43-
| Mode | Duration (s) | API Calls | Interactions | Success Rate |
44-
| :------ | :----------- | :-------- | :----------- | :----------- |
45-
| Control | 79.4 | 12.8 | 1.2 | 100.0% |
46-
| MCP | 62.3 | 9.6 | 1.1 | 100.0% |
43+
| Metric | Control | MCP | Change |
44+
| :--------------------- | :--------- | :--------- | :------- |
45+
| Duration (s) | 79.41 | 62.27 | -21.57% |
46+
| API Calls | 12.78 | 9.63 | -24.67% |
47+
| Interactions | 1.22 | 1.13 | -7.95% |
48+
| Tokens | 2359.33 | 2659.88 | +12.74% |
49+
| Cache Reads | 262556.11 | 281086.13 | +7.06% |
50+
| Cache Writes | 17196.33 | 25627.63 | +49.03% |
51+
| Cost ($) | 0.18 | 0.22 | +23.50% |
52+
| Success Rate | 100.00% | 100.00% | 0.00% |
4753

4854
#### Task 2: Create a Task Router Activity
4955

50-
| Mode | Duration (s) | API Calls | Interactions | Success Rate |
51-
| :------ | :----------- | :-------- | :----------- | :----------- |
52-
| Control | 46.4 | 8.4 | 1.0 | 77.8% |
53-
| MCP | 30.7 | 5.9 | 1.0 | 100.0% |
56+
| Metric | Control | MCP | Change |
57+
| :--------------------- | :--------- | :--------- | :------- |
58+
| Duration (s) | 46.37 | 30.71 | -33.77% |
59+
| API Calls | 8.44 | 5.88 | -30.43% |
60+
| Interactions | 1.00 | 1.00 | 0.00% |
61+
| Tokens | 2058.89 | 1306.63 | -36.54% |
62+
| Cache Reads | 144718.44 | 164311.50 | +13.54% |
63+
| Cache Writes | 6864.44 | 11219.13 | +63.44% |
64+
| Cost ($) | 0.10 | 0.11 | +11.09% |
65+
| Success Rate | 77.78% | 100.00% | +28.57% |
5466

5567
#### Task 3: Create a Queue with Task Filter
5668

57-
| Mode | Duration (s) | API Calls | Interactions | Success Rate |
58-
| :------ | :----------- | :-------- | :----------- | :----------- |
59-
| Control | 61.8 | 9.5 | 1.0 | 100.0% |
60-
| MCP | 56.1 | 9.4 | 1.0 | 100.0% |
69+
| Metric | Control | MCP | Change |
70+
| :--------------------- | :--------- | :--------- | :------- |
71+
| Duration (s) | 61.77 | 56.07 | -9.23% |
72+
| API Calls | 9.50 | 9.38 | -1.32% |
73+
| Interactions | 1.00 | 1.00 | 0.00% |
74+
| Tokens | 2459.38 | 2457.63 | -0.07% |
75+
| Cache Reads | 164319.50 | 293059.75 | +78.35% |
76+
| Cache Writes | 8822.88 | 14074.88 | +59.53% |
77+
| Cost ($) | 0.12 | 0.18 | +49.06% |
78+
| Success Rate | 100.00% | 100.00% | 0.00% |
6179

6280
## Benchmark Design & Metrics
6381

@@ -191,7 +209,8 @@ The benchmark focuses on these key insights:
191209
1. **Time Efficiency:** Comparing the time it takes to complete tasks using MCP vs. traditional methods
192210
2. **API Efficiency:** Measuring the reduction in API calls when using MCP
193211
3. **Interaction Efficiency:** Evaluating if MCP reduces the number of interactions needed to complete tasks
194-
4. **Success Rate:** Determining if MCP improves the reliability of task completion
212+
4. **Cost Efficiency** Evalutating if the added MCP context has an impact on Token Costs
213+
5. **Success Rate:** Determining if MCP improves the reliability of task completion
195214
196215
Negative percentage changes in duration, API calls, and interactions indicate improvements, while positive changes in success rate indicate improvements.
197216

0 commit comments

Comments
 (0)