Skip to content

Evaluator Contribution

guanxinyi edited this page Jun 5, 2025 · 5 revisions

Evaluator Contribution

1. Prepare

Dev env see Installation.

2. Run Evaluator

Complete Configuration and run:

rush dev:eval

3. Evaluator Guide

Main structure of web-bench could see Evaluator

├─ tools
│ ├─ evaluator
│ │ ├─ src
│ │ │ ├─ ignore
│ │ │ ├─ log
│ │ │ ├─ parser
│ │ │ ├─ plugins
│ │ │ │ ├─ evaluator-runner.ts
│ │ │ │ ├─ project-runner.ts
│ │ │ │ ├─ task-runner
│ │ │ ├─ runner
│ │ │ ├─ settings
│ │ │ ├─ utils
  • runner:

    • evaluator-runner: Evaluation entry,the runner processes m*n project-runner (m: projects count, n: models count)
    • project-runner: The runner processes tasks in sequence. Upon reaching the retry limit (2 attempts), it terminates. Evaluator-Workflow Step 2 and Step 9.
    • task-runner: The runner will call agent, rewrite files, init envs,build files, tests and retry. Evaluator-Workflow Step 1 and Step 3-8.
  • plugins:

4. Test

Execute the following command to run evaluations and view the results in apps/eval/report:

rush eval

5. Tips

In the development environment, configure parameters in apps/eval/src/config.json5:

  • logLevel: 'debug', get more information.
  • projects: ['@web-bench/xxxx'], not process all projects.

More details in Config Parameters.

Clone this wiki locally