Skip to content

Minor eval fixes#471

Merged
husseinmozannar merged 3 commits intomainfrom
minor_eval_fixes
Dec 10, 2025
Merged

Minor eval fixes#471
husseinmozannar merged 3 commits intomainfrom
minor_eval_fixes

Conversation

@husseinmozannar
Copy link
Copy Markdown
Contributor

@husseinmozannar husseinmozannar commented Dec 5, 2025

Fixes:

  • add flush for messages
  • use asyncio for all file writing
  • add dummy user option for running evals:
    python experiments/eval/run.py --current-dir . --dataset Gaia --split validation-1 --run-id 1123 --simulated-user-type dummy --parallel 1 --config experiments/endpoint_configs/config.yaml --mode run
  • do eval directly after running task

@afourney
Copy link
Copy Markdown
Member

This appears to be working for me, including sentinel steps.

@afourney afourney self-requested a review December 10, 2025 21:06
@husseinmozannar husseinmozannar merged commit 564e9b0 into main Dec 10, 2025
9 checks passed
@husseinmozannar husseinmozannar deleted the minor_eval_fixes branch December 10, 2025 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants