Skip to content

Conversation

tkattkat
Copy link
Collaborator

why

currently the agent eval run by default, does not use an execution model

what changed

a default execution model is now set, which can be overridden with env var AGENT_EVAL_EXECUTION_MODEL

test plan

tested locally

Copy link

changeset-bot bot commented Sep 23, 2025

🦋 Changeset detected

Latest commit: f340c37

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR adds parameterization for the agent execution model in eval runners. Previously, agent evaluations ran without an execution model set. Now, non-CUA models get a default execution model (google/gemini-2.5-flash) that can be overridden via the AGENT_EVAL_EXECUTION_MODEL environment variable.

  • Added environment variable AGENT_EVAL_EXECUTION_MODEL for execution model configuration
  • Set default execution model to google/gemini-2.5-flash for non-CUA models
  • CUA models (computer-use-preview and claude) continue to work without execution model

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk
  • The change is straightforward and well-contained, adding proper fallback behavior for execution models. The logic correctly differentiates between CUA and non-CUA models, and the default value choice is reasonable
  • No files require special attention

Important Files Changed

File Analysis

Filename        Score        Overview
evals/initStagehand.ts 4/5 Added parameterized execution model with environment variable fallback and default value for non-CUA models

Sequence Diagram

sequenceDiagram
    participant Eval as Eval Runner
    participant Init as initStagehand
    participant Env as Environment
    participant Agent as Stagehand Agent

    Eval->>Init: Call initStagehand with modelName
    Init->>Init: Check if isCUAModel(modelName)
    
    alt CUA Model (computer-use-preview or claude)
        Init->>Agent: Create agentConfig with model & provider
        Note right of Init: executionModel not set for CUA models
    else Non-CUA Model
        Init->>Env: Check AGENT_EVAL_EXECUTION_MODEL
        Env-->>Init: Return env var or undefined
        Init->>Init: Set executionModel = env ?? "google/gemini-2.5-flash"
        Init->>Agent: Create agentConfig with model & executionModel
    end
    
    Init->>Agent: stagehand.agent(agentConfig)
    Agent-->>Eval: Return configured agent
Loading

1 file reviewed, no comments

Edit Code Review Bot Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant