Add AgentBench-Live skill for benchmarking agents

Would be great to have a ClawForge skill that runs [AgentBench-Live](https://github.com/jackjin1997/AgentBench-Live) benchmarks.

AgentBench-Live tests AI agents on 10 real-world tasks across 5 domains (code, data, research, tool-use, multi-step). Current results show Claude Code at 0.74 avg vs Gemini CLI at 0.52.

A skill could let users benchmark their own OpenClaw instance directly from the CLI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AgentBench-Live skill for benchmarking agents #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add AgentBench-Live skill for benchmarking agents #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions