-
Notifications
You must be signed in to change notification settings - Fork 24
Evaluator Contribution
guanxinyi edited this page Jun 5, 2025
·
5 revisions
Dev env see Installation.
Complete Configuration and run:
rush dev:evalMain structure of web-bench could see Evaluator
├─ tools
│ ├─ evaluator
│ │ ├─ src
│ │ │ ├─ ignore
│ │ │ ├─ log
│ │ │ ├─ parser
│ │ │ ├─ plugins
│ │ │ │ ├─ evaluator-runner.ts
│ │ │ │ ├─ project-runner.ts
│ │ │ │ ├─ task-runner
│ │ │ ├─ runner
│ │ │ ├─ settings
│ │ │ ├─ utils
-
runner:
- evaluator-runner: Evaluation entry,the runner processes m*n project-runner (m: projects count, n: models count)
- project-runner: The runner processes tasks in sequence. Upon reaching the retry limit (2 attempts), it terminates. Evaluator-Workflow Step 2 and Step 9.
- task-runner: The runner will call agent, rewrite files, init envs,build files, tests and retry. Evaluator-Workflow Step 1 and Step 3-8.
-
plugins:
- In Evaluation Workflow, each step is injected in the form of a Plugin, which includes both the plugin schedule and the specific implementation of each step plugin in Evaluation Workflow.
Execute the following command to run evaluations and view the results in apps/eval/report:
rush evalIn the development environment, configure parameters in apps/eval/src/config.json5:
- logLevel: 'debug', get more information.
- projects: ['@web-bench/xxxx'], not process all projects.
More details in Config Parameters.