feat(test): Evaluate MCP tools usage #156

MichalKalita · 2025-07-02T08:57:21Z

Closes https://github.com/apify/ai-team/issues/30

It's a simple test to run Agent with a prompt and check how the tools were used.
The target is to be sure the tool description is good and LLM doesn't make mistakes with tool selection and arguments.

MichalKalita · 2025-07-03T13:08:32Z

This MCP server is not compatible with the Mastra MCP client. We allow adding tools (Actors) at runtime inside LLM requests, while Mastra only allows adding tools between requests. This may be a problem for more clients.

We have two options:

Implement Add generic call-actor tool #155
Use on MCP client other than Mastra ( but it's itself a problem, when popular MCP clients don't work, it's itself an evaluation fail)

@jirispilka @MQ37

MQ37 · 2025-07-03T13:23:30Z

This MCP server is not compatible with the Mastra MCP client. We allow adding tools (Actors) at runtime inside LLM requests, while Mastra only allows adding tools between requests. This may be a problem for more clients.

We have two options:
1. Implement [Add generic call-actor tool #155](https://github.com/apify/actors-mcp-server/issues/155)

2. Use on MCP client other than Mastra ( but it's itself a problem, when popular MCP clients don't work, it's itself an evaluation fail)
@jirispilka @MQ37

I think this conclusion makes sense, I think we should implement the generic call-actor tool 👍

jirispilka · 2025-07-15T13:25:08Z

@MichalKalita is this PR still relevant since we have generic call-actor tool

MichalKalita · 2025-07-16T08:34:39Z

@jirispilka I'm closing this PR. We want a tool that allows us to conduct A/B tests in complex scenarios, collect all metrics, and decide which way is better.

feat(test): mcp evaluation

ce5259e

github-actions bot assigned MichalKalita Jul 2, 2025

github-actions bot added t-ai Issues owned by the AI team. tested Temporary label used only programatically for some analytics. labels Jul 2, 2025

add anthropic models

1f5d151

MichalKalita closed this Jul 16, 2025

MichalKalita deleted the feature/evaluation branch July 16, 2025 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(test): Evaluate MCP tools usage #156

feat(test): Evaluate MCP tools usage #156

Uh oh!

MichalKalita commented Jul 2, 2025

Uh oh!

MichalKalita commented Jul 3, 2025

Uh oh!

MQ37 commented Jul 3, 2025

Uh oh!

jirispilka commented Jul 15, 2025

Uh oh!

MichalKalita commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(test): Evaluate MCP tools usage #156

feat(test): Evaluate MCP tools usage #156

Uh oh!

Conversation

MichalKalita commented Jul 2, 2025

Uh oh!

MichalKalita commented Jul 3, 2025

Uh oh!

MQ37 commented Jul 3, 2025

Uh oh!

jirispilka commented Jul 15, 2025

Uh oh!

MichalKalita commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants