-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
File Name
What happened?
Summary
The client.evals.run_inference() method in the Gen AI Evaluation SDK fails to evaluate
agents that use Multi-Agent architecture.
Environment
- SDK Version: google-cloud-aiplatform[evaluation] (latest)
- Agent Framework: Google ADK (Agent Development Kit)
- Agent Structure: Multi-Agent with sub_agents (root_agent → search_agent, reservation_agent)
- Tools: MCP Toolbox via
toolbox_corelibrary - Deployment: Vertex AI Agent Engine (Reasoning Engine)
- Region: us-central1
Steps to Reproduce
-
Deploy a Multi-Agent using ADK with the following structure:
mcp_toolset = McpToolset(toolbox_server_url=TOOLBOX_SERVER_URL)search_agent = Agent(
name="search_agent",
tools=[analyze_text, get_current_datetime, mcp_toolset],
...
)root_agent = Agent(
name="test",
sub_agents=[reservation_agent, search_agent],
...
)
2. Run evaluation using Gen AI Eval SDK:
agent_dataset_with_inference = client.evals.run_inference(
agent=AGENT_RESOURCE,
src=agent_dataset,
)
3. The evaluation fails with a parsing error.
Expected Behavior
The SDK should successfully run inference on the deployed Multi-Agent and collect
intermediate_events and response for evaluation.
Actual Behavior
The SDK returns an error:
{
"error": "Failed to parse agent run response [...] to intermediate events and final response: 'text'"
}## Root Cause (from Agent Engine logs)
The actual error in Agent Engine logs shows an asyncio context issue:
Relevant log output
{"error": "Failed to parse agent run response [{'model_version': 'gemini-2.5-flash', 'content': {'parts': [{'function_call': {'id': 'adk-946f20ca-e878-4870-8b11-216eeadb4fec', 'args': {'agent_name': 'search_agent'}, 'name': 'transfer_to_agent'}, 'thought_signature': 'Cp4HAY89a1_kpY2YcICNaaeVDSipHZ5pI2OCglzkjVDUcO-pvOVTjIAHXZooc99UnWWf8DQG_qRgh5BOePI8u08PT8y54U5uPLUhJUbrWt4zVxsOJCUW9_XgIJCONckbqj09olhLwQzX_lGATHRjmFqY9r2IAdI6HgzuUvNGR_Ol7mVC3rgHO83WGkDYhhF85_fl_zjH1wt9ju37r2OMEr-wJOOOWs0TZk8ELgxLqtdZi4tubuCOw2gbBKq3x-Pxp5jONSiSKKuvDR-SR1Dsaogvt-yr2Lqu2X3a1YV-4CnsCbfDVh40OTzCx-mrSURYvuQNdpNO08iIiUXoBxqc5NG5QBjzvXhYJrixVSly4aeI1HnYWnXXGr0MNx9fSL7h2uakRwsd1mCffYWAt3zSWP3bw1W44c7zb4AlwnBb27s5yeEiHYFgyjG0PT9S4JRBgseaQbewd9wktHEtGY461kYuhgNUEYF9-MvU0c2EsJIX7rd4cxT1KnOsRk-coBAEbwX450CWkKzyfE3t0_zoArG3uJm4WU4nL7Xfj2d_tmXvX-18xS4O6P5FtDmhpwxwPB1AW1vbi6xHx6XSi9hKcTHFki_R78ABIvBfXjox4rkGn12BI6sbPRMMtlUQoNNo82UinlLlCKxhwI46sQhCNb3T_i4BeXb43rdxp7HTg6iDY7GfwTcqXSrx6gXTvbl6jR9qAFlHvvmdgU9g26GSMltA_PfouuAMh0ku7jVdmxyE9e_1ToTp3ByGVhvJQhOJtuupDkmtBzbxlE5cffdh_TOwOOItxIMAtN3h2gKdvUlUT33VXiZnxDO89ezOGqMD4ILljDtUg6ikdMb4AOSDVyl_iAFG3Z98yO7A3dq9G5_GjyBFFRh9xmSjlNGGt1oBCYbuKYDdYC7gE9KEOA5f3rJ6hs0V1k8aqo9FPtYresNVZX5jxLa_K_u6TLgsjBptVw7BC72DcFR7f0J78YwuwjKGn2ZynfwnNYq099pzWSyq9GurMmA7gxl2MunxlBnRO4nODjFAi30Mp3IwKBdgeeqm0vmmduky5qEw8iBgFPIWM4hBqc_k380UOcUXoRSEK4Oi5Aqlcxfkyaz0xD8KxS9iPaiu0Dom0pkIH7Jh4vywIuA7KWU9_hNP94RvaM19CzcwefXscKFi17O9nkCHi5_QCeWnqXWdW5YPhvbynUx3KCGDq8KSgSLWg5E5xoH62aiTIshKALoNx_QHHcAgWZI='}], 'role': 'model'}, 'finish_reason': 'STOP', 'usage_metadata': {'candidates_token_count': 11, 'candidates_tokens_details': [{'modality': 'TEXT', 'token_count': 11}], 'prompt_token_count': 1215, 'prompt_tokens_details': [{'modality': 'TEXT', 'token_count': 1215}], 'thoughts_token_count': 215, 'total_token_count': 1441, 'traffic_type': 'ON_DEMAND'}, 'avg_logprobs': -4.73370985551314, 'invocation_id': 'e-5814a0db-4e23-4ab3-85db-74fa2f0224f5', 'author': 'catchtable', 'actions': {'state_delta': {}, 'artifact_delta': {}, 'requested_auth_configs': {}, 'requested_tool_confirmations': {}}, 'long_running_tool_ids': [], 'id': 'ea958c9f-339d-47e0-a5a3-0ca83f666a3f', 'timestamp': 1767945596.923423}, {'content': {'parts': [{'function_response': {'id': 'adk-946f20ca-e878-4870-8b11-216eeadb4fec', 'name': 'transfer_to_agent', 'response': {'result': None}}}], 'role': 'user'}, 'invocation_id': 'e-5814a0db-4e23-4ab3-85db-74fa2f0224f5', 'author': 'catchtable', 'actions': {'state_delta': {}, 'artifact_delta': {}, 'transfer_to_agent': 'search_agent', 'requested_auth_configs': {}, 'requested_tool_confirmations': {}}, 'id': '806afc44-3b0b-431f-be3d-4a1bfc0ac8e0', 'timestamp': 1767945598.723608}] to intermediate events and final response: 'text'"}Code of Conduct
- I agree to follow this project's Code of Conduct