Jobs can belong to evalsets but dont have to #274

shfunc · 2026-01-13T10:25:48Z

Note

feat: Job–Taskset linking and build secrets support; bump to 0.5.13.

Adds --taskset to hud eval and plumbs through run_dataset and hud.eval(); async job enter now optionally creates tasks and returns task_version_ids to link runs to a taskset
Makes _send_job_enter async and extends JobEnterPayload with taskset and optional tasks
env.scenario() accepts required_env_vars and includes them in prompt/resource meta
hud build gains repeatable --secret flags; forwards to Docker --secret and enables BuildKit when used; applies to both initial and final builds
Version bumped to 0.5.13; docs and tests updated accordingly

^{Written by Cursor Bugbot for commit 19a0e55. This will update automatically on new commits. Configure here.}

hud/eval/manager.py

hud/cli/eval.py

ryantzr1 · 2026-01-17T16:09:19Z

HUD Bugbot review

Summary: Optional taskset association is threaded through eval config/CLI, with async job-enter calls that can pre-create tasks and attach returned task_version_ids to in-memory Task objects for both local and remote eval flows.

Findings:

Title: Async _send_job_enter now requires awaiting in all callers
Severity: medium
File: hud/eval/manager.py (around lines 56-88, 273-312)
Description: _send_job_enter was converted from sync to async and now returns task_version_ids, but its name and prior usage suggest a fire-and-forget telemetry helper. Any remaining synchronous callers (inside or outside this diff) that still use it as a regular function will silently stop sending job-enter events.
Suggestion: Ensure all existing callers are updated to await this coroutine, and add a small sync wrapper (e.g., _send_job_enter_sync) if you still need call sites from sync contexts. Consider renaming to _async_send_job_enter to make the contract explicit and avoid incorrect usage.
Title: Potential mismatch between tasks_to_create and returned task_version_ids
Severity: low
File: hud/eval/manager.py (around lines 273-312)
Description: When taskset is provided, the client filters tasks without an id, sends them to the backend, and then zips tasks_to_create with the returned task_version_ids. This assumes the backend preserves ordering and cardinality. If the backend reorders, filters, or partially rejects tasks, IDs could be misassigned or silently dropped due to zip(strict=False).
Suggestion: Confirm the backend contract guarantees a one-to-one, ordered mapping. If not, include a stable client-generated identifier per task in the payload and require the backend to echo it back so you can align deterministically. Optionally, assert or log when the lengths differ.
Title: Single-task eval may proceed with job_id still None
Severity: low
File: hud/eval/manager.py (around lines 235-272)
Description: In the single-task branch, job_id_for_run is only set when taskset is provided and job_id_for_run is None. For total_evals == 1 with taskset unset and job_id still None, job_id_for_run stays None and is passed to EvalContext.from_task. If downstream logic expects a job id for URLs or telemetry, this diverges from the multi-task branch, which always creates an implicit_job_id.
Suggestion: Decide whether a single-task eval should always have a job id. If yes, always generate one when job_id is None. If no, document this behavior and ensure all consumers tolerate None.
Title: New async HTTP call on eval start can affect latency and user perception
Severity: low
File: hud/eval/manager.py (around lines 56-88, 250-312)
Description: _send_job_enter now issues an async HTTP request with a 10s timeout in the hot path for --taskset in both single- and multi-task flows. Slow or failing HUD API responses can delay eval startup and make it appear stuck, even though failures are only logged.
Suggestion: Consider a shorter timeout or making this more fire-and-forget (e.g., background task that doesn�t block eval startup). At minimum, log a clear message or surface a small user-facing note when job-enter is slow or fails so they understand taskset linking may be missing while eval continues.
Title: Large task payloads when pre-creating many taskset tasks
Severity: low
File: hud/cli/eval.py (around lines 655-689)
Description: For remote execution with --taskset, all tasks lacking an id are serialized and sent in one job_enter request. Large datasets could create oversized payloads and hit server limits or timeouts.
Suggestion: Verify expected dataset sizes and backend limits. If jobs can contain many tasks, consider batching task creation, sending only minimal metadata needed to pre-create tasks, or using a more streaming-friendly path for full task definitions.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

hud/eval/manager.py

…lya/hud-553-jobs-can-belong-to-evalsets-but-dont-have-to

…s/hud-python into ilya/hud-553-jobs-can-belong-to-evalsets-but-dont-have-to

shfunc added 2 commits January 12, 2026 15:11

add --taskset flag to associate jobs with tasksets

9a76b81

send tasks to platform for taskset auto-add

5130078

cursor bot reviewed Jan 13, 2026

View reviewed changes

hud/eval/manager.py Show resolved Hide resolved

single task fix

a8f5780

cursor bot reviewed Jan 13, 2026

View reviewed changes

hud/cli/eval.py Show resolved Hide resolved

shfunc added 2 commits January 13, 2026 13:10

taskset remote fix

741c57e

async job_enter + cli cleanup

7a283f1

shfunc assigned shfunc and lorenss-m and unassigned shfunc Jan 15, 2026

dylanbowman314 and others added 2 commits January 17, 2026 17:04

apply nancyjlau's patch

0423952

update tests and edge cases, add scenario addition

d9fd5ab

cursor bot reviewed Jan 18, 2026

View reviewed changes

hud/eval/manager.py Outdated Show resolved Hide resolved

lorenss-m and others added 6 commits January 17, 2026 18:34

small fix

9c19a7f

add rest of patch

c049fcf

patch cursor bug with build phases

5d55942

Merge branch 'main' of https://github.com/hud-evals/hud-python into i…

c8229c3

…lya/hud-553-jobs-can-belong-to-evalsets-but-dont-have-to

version bump

7ab0b4a

Merge branch 'dylan/hud-build-secrets' of https://github.com/hud-eval…

19a0e55

…s/hud-python into ilya/hud-553-jobs-can-belong-to-evalsets-but-dont-have-to

lorenss-m merged commit e8e4b22 into main Jan 18, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs can belong to evalsets but dont have to #274

Jobs can belong to evalsets but dont have to #274

Uh oh!

shfunc commented Jan 13, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

ryantzr1 commented Jan 17, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Jobs can belong to evalsets but dont have to #274

Jobs can belong to evalsets but dont have to #274

Uh oh!

Conversation

shfunc commented Jan 13, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryantzr1 commented Jan 17, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

shfunc commented Jan 13, 2026 •

edited by cursor bot

Loading