paramaterize agent execution model in eval runner #1093

tkattkat · 2025-09-23T19:02:48Z

why

currently the agent eval run by default, does not use an execution model

what changed

a default execution model is now set, which can be overridden with env var AGENT_EVAL_EXECUTION_MODEL

test plan

tested locally

changeset-bot · 2025-09-23T19:02:51Z

🦋 Changeset detected

Latest commit: f340c37

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

greptile-apps

Greptile Overview

Summary

This PR adds parameterization for the agent execution model in eval runners. Previously, agent evaluations ran without an execution model set. Now, non-CUA models get a default execution model (google/gemini-2.5-flash) that can be overridden via the AGENT_EVAL_EXECUTION_MODEL environment variable.

Added environment variable AGENT_EVAL_EXECUTION_MODEL for execution model configuration
Set default execution model to google/gemini-2.5-flash for non-CUA models
CUA models (computer-use-preview and claude) continue to work without execution model

Confidence Score: 4/5

This PR is safe to merge with minimal risk
The change is straightforward and well-contained, adding proper fallback behavior for execution models. The logic correctly differentiates between CUA and non-CUA models, and the default value choice is reasonable
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
evals/initStagehand.ts	4/5	Added parameterized execution model with environment variable fallback and default value for non-CUA models

Sequence Diagram

sequenceDiagram
    participant Eval as Eval Runner
    participant Init as initStagehand
    participant Env as Environment
    participant Agent as Stagehand Agent

    Eval->>Init: Call initStagehand with modelName
    Init->>Init: Check if isCUAModel(modelName)
    
    alt CUA Model (computer-use-preview or claude)
        Init->>Agent: Create agentConfig with model & provider
        Note right of Init: executionModel not set for CUA models
    else Non-CUA Model
        Init->>Env: Check AGENT_EVAL_EXECUTION_MODEL
        Env-->>Init: Return env var or undefined
        Init->>Init: Set executionModel = env ?? "google/gemini-2.5-flash"
        Init->>Agent: Create agentConfig with model & executionModel
    end
    
    Init->>Agent: stagehand.agent(agentConfig)
    Agent-->>Eval: Return configured agent

_{1 file reviewed, no comments}

_{Edit Code Review Bot Settings | Greptile}

tkattkat added 2 commits September 23, 2025 12:01

paramaterize agent execution model in eval runner

61d1472

update example env

f340c37

greptile-apps bot reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

paramaterize agent execution model in eval runner #1093

paramaterize agent execution model in eval runner #1093

tkattkat commented Sep 23, 2025

Uh oh!

changeset-bot bot commented Sep 23, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

paramaterize agent execution model in eval runner #1093

Are you sure you want to change the base?

paramaterize agent execution model in eval runner #1093

Conversation

tkattkat commented Sep 23, 2025

why

what changed

test plan

Uh oh!

changeset-bot bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

changeset-bot bot commented Sep 23, 2025 •

edited

Loading