Skip to content

Conversation

@mikasenghaas
Copy link
Member

@mikasenghaas mikasenghaas commented Jan 15, 2026

Description

This PR implements evaluating multiple environments in parallel via vf-eval. For more details check the updated docs.

This PR is mainly concerned with the config system. Cosmetic updates will be shipped separately, e.g see #735

Examples

By default, we still evaluate a single env with no changes to the interface

uv run vf-eval gsm8k -n5 -r3

To configure multi-environment training, specify a comma-separated list of env ids

uv run vf-eval gsm8k,alphabet-sort -n5 -r3

Note, that all environments use their default configuration. Since CLI arguments apply to all enviroments one can only change values for all environments at the same time. To have more fine-grained configurability, check below.

To configure multi-environment training with (potentially) different arguments for each specify a path to a TOML config file

uv run vf-eval configs/evals/debug.toml -n5 -r3
# configs/local/vf-eval/debug.toml
[[env]]
id = "gsm8k"
num_examples = 1
rollouts_per_example = 1

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Introduces parallel multi-environment evaluation and a more flexible CLI.

  • CLI positional env_id_or_path now accepts a single env ID, a comma-separated list, or a TOML file; per-env settings resolve with precedence: TOML > CLI > env defaults > global
  • New MultiEvalConfig and run_multi_evaluation() execute all envs concurrently; refactors single-run flow and centralizes result printing/performance reporting
  • Adds TOML helpers is_toml_config() and load_toml_config() with validation; simplifies print_results and moves event loop lag monitoring to multi-run; reduces lag monitor log level to debug
  • Removes print_results from EvalConfig; retains existing flags/behavior for single-env runs
  • Expands docs with multi-env usage and precedence; adds example config configs/evals/debug.toml
  • Adds comprehensive tests covering CLI parsing, TOML loading/validation, multi-env config merging, and precedence

Written by Cursor Bugbot for commit c4d690d. This will update automatically on new commits. Configure here.

@mikasenghaas mikasenghaas mentioned this pull request Jan 15, 2026
13 tasks
@mikasenghaas mikasenghaas requested a review from willccbb January 15, 2026 17:22
@mikasenghaas mikasenghaas marked this pull request as ready for review January 15, 2026 17:23
@mikasenghaas mikasenghaas changed the title multi-env evals multi-env evals config Jan 15, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

raw_multi_env_config = [{"env_id": env_id} for env_id in env_ids]
else:
# single-eval env
raw_multi_env_config = [{"env_id": args.env_id_or_path}]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing TOML path gives confusing module-not-found error

Low Severity

When a user provides a path ending in .toml but the file doesn't exist (e.g., typo in path like config/debug.toml instead of configs/debug.toml), is_toml_config returns False because Path.is_file() fails. The code then falls through to treating the path as an environment ID, causing a confusing "module not found" error instead of "TOML config file not found". Since no valid environment ID would end in .toml, paths with this extension should check for file existence and give a clear error message when missing.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants