Skip to content

Conversation

@cheng-tan
Copy link
Collaborator

@cheng-tan cheng-tan commented Dec 17, 2025

Fixes https://github.com/microsoft/magentic-ui2.0/issues/24

This PR includes multiple fixes:

Fix --parallel:

Currently webvoyager benchmark crashes when parallel > 1. The error is:

  File "experiments/eval/run.py", line 187, in run_system_sim_user
    run_system_evaluation(args, system, system_name, config)
  File "experiments/eval/run.py", line 141, in run_system_evaluation
    run_evaluate_benchmark_func(
 ....
  File ".local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't get local object 'get_bearer_token_provider.<locals>.wrapper'

Root cause of the bug:
Python's multiprocessing needs to pickle objects to send them to worker processes. The code was passing system and benchmark instances that contained unpicklable variables like azure token provider, browser instances etc

This PR passes constructors for parallel mode and refresh the system per task.

Other fixes:

  • Updated webvoyager task list to be the same in Fara repo.
  • Support custom eval client and fall back to open ai
  • Clean up clients properly on exceptions

@cheng-tan cheng-tan changed the title Small updates for webvoyager benchmark Fix parallelization and a couple small updates for webvoyager benchmark Dec 17, 2025
@husseinmozannar
Copy link
Contributor

this looks good overall to me, everything still works for GAIA as well with this fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants