Skip to content

Add AgentBench-Live skill for benchmarking agents #1

@jackjin1997

Description

@jackjin1997

Would be great to have a ClawForge skill that runs AgentBench-Live benchmarks.

AgentBench-Live tests AI agents on 10 real-world tasks across 5 domains (code, data, research, tool-use, multi-step). Current results show Claude Code at 0.74 avg vs Gemini CLI at 0.52.

A skill could let users benchmark their own OpenClaw instance directly from the CLI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions