Skip to content

[FEATURE] Multi-agent pattern: Arena #1306

@stefanoamorelli

Description

@stefanoamorelli

Problem Statement

AFAIK we don't have a pattern for: "I'm not sure which approach is best at performing a task beforehand, so try several at the same time and let a judge decide."

The Agent-as-a-Judge paradigm (that builds on top of the more common LLM-as-a-Judge) would fit naturally here.

Proposed Solution

from strands.multiagent import Arena
from strands import Agent

judge_agent = Agent(
   system_prompt="""You are a judge. Evaluate the solutions provided and pick the best one.
   Use your tools to verify claims, run code, check facts.
   Return your verdict with reasoning.""",
   tools=[run_code, verify_facts]
)

result = Arena(
  agents=[agent_a, agent_b, agent_c],
  judge=judge_agent,
).run("Design an API for user authentication")

Agents run in parallel and the judge evaluates, making the winner "emerge".

Use Case

I have a few different agent/multi-agent configurations and I want to know which one works best. I'm trying different prompts/ comparing models/ testing whether adding a tool actually helps, etc...I don't know which one will perform better on this task beforehand, so I want to run them all and choose the one who performs better.

The "Judge" agent can verify the outputs and pick the one it considers best. Because it's an agent, it can use tools (and all the agent functionality) to validate results rather than just comparing text.

Alternatives Solutions

No response

Additional Context

Agent-as-a-Judge: Evaluate Agents with Agents
When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions