-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Description
Inspect AI recently added a concurrency parameter to control whether a call to exec should be count towards the concurrency limit (commit 239386ba). This is to address an issue with Claude Code agent in inspect_swe where running multiple evals with Claude Code agent in parallel would cause the sandbox to get stuck due to hitting the limit.
In K8s case, I believe the limit being hit is pod-op (src). My current work around is to set INSPECT_MAX_POD_OPS=10000.
Expected behaviour
Running multiple Claude Code agent in parallel doesn't cause everything to pause indefinitely (i.e., until task timeout).
Reproduction
I suspect this minimal reproduction would work, though I haven't tested it:
from inspect_ai import Task, eval_set, task
from inspect_ai.dataset import Sample
from inspect_evals.cybench import cybench
from inspect_swe import claude_code
eval_set(
cybench(agent=claude_code()),
sandbox="k8s",
log_dir="/tmp/",
model=["anthropic/claude-sonnet-4-20250514"],
max_tasks=25,
max_sandboxes=50,
max_connections=50,
time_limit=3600,
token_limit=50000,
)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels