-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Please read this first
- Have you read the docs?Agents SDK docs
- Have you searched for related issues? Others may have had similar requests
Question: Evaluation Metrics for Multi-Agent Systems in OpenAI Agents SDK
Hi everyone,
I've been exploring the OpenAI Agents SDK (Python) and I see it includes useful features like tracing, handoffs, guardrails, and orchestration primitives for building multi-agent workflows.
However, I haven't found any documented framework or built-in methods specifically focused on evaluating multi-agent system performance—e.g., coordination efficiency, collaboration quality, agent-to-agent communication robustness, or other multi-agent metrics.
I have two related questions:
-
Are there any existing frameworks or evaluation approaches (maybe external or community-driven) that people commonly use to assess multi-agent systems built with the SDK?
-
Does the OpenAI team plan to include formalized evaluation metrics or frameworks for multi-agent systems within the OpenAI Agents SDK in future versions?
It would be great to hear from others: what metrics (e.g., latency, tool invocation counts, success rate, coordination delays, consistency in handoffs, etc.) do you use to assess multi-agent system performance in practice?
Thanks for building such a powerful SDK—and thanks in advance for any guidance or thoughts!