Skip to content

Conversation

@shfunc
Copy link
Contributor

@shfunc shfunc commented Jan 13, 2026

Note

feat: Job–Taskset linking and build secrets support; bump to 0.5.13.

  • Adds --taskset to hud eval and plumbs through run_dataset and hud.eval(); async job enter now optionally creates tasks and returns task_version_ids to link runs to a taskset
  • Makes _send_job_enter async and extends JobEnterPayload with taskset and optional tasks
  • env.scenario() accepts required_env_vars and includes them in prompt/resource meta
  • hud build gains repeatable --secret flags; forwards to Docker --secret and enables BuildKit when used; applies to both initial and final builds
  • Version bumped to 0.5.13; docs and tests updated accordingly

Written by Cursor Bugbot for commit 19a0e55. This will update automatically on new commits. Configure here.

@shfunc shfunc assigned shfunc and lorenss-m and unassigned shfunc Jan 15, 2026
@ryantzr1
Copy link
Contributor

HUD Bugbot review

Summary: Optional taskset association is threaded through eval config/CLI, with async job-enter calls that can pre-create tasks and attach returned task_version_ids to in-memory Task objects for both local and remote eval flows.

Findings:

  • Title: Async _send_job_enter now requires awaiting in all callers
    Severity: medium
    File: hud/eval/manager.py (around lines 56-88, 273-312)
    Description: _send_job_enter was converted from sync to async and now returns task_version_ids, but its name and prior usage suggest a fire-and-forget telemetry helper. Any remaining synchronous callers (inside or outside this diff) that still use it as a regular function will silently stop sending job-enter events.
    Suggestion: Ensure all existing callers are updated to await this coroutine, and add a small sync wrapper (e.g., _send_job_enter_sync) if you still need call sites from sync contexts. Consider renaming to _async_send_job_enter to make the contract explicit and avoid incorrect usage.

  • Title: Potential mismatch between tasks_to_create and returned task_version_ids
    Severity: low
    File: hud/eval/manager.py (around lines 273-312)
    Description: When taskset is provided, the client filters tasks without an id, sends them to the backend, and then zips tasks_to_create with the returned task_version_ids. This assumes the backend preserves ordering and cardinality. If the backend reorders, filters, or partially rejects tasks, IDs could be misassigned or silently dropped due to zip(strict=False).
    Suggestion: Confirm the backend contract guarantees a one-to-one, ordered mapping. If not, include a stable client-generated identifier per task in the payload and require the backend to echo it back so you can align deterministically. Optionally, assert or log when the lengths differ.

  • Title: Single-task eval may proceed with job_id still None
    Severity: low
    File: hud/eval/manager.py (around lines 235-272)
    Description: In the single-task branch, job_id_for_run is only set when taskset is provided and job_id_for_run is None. For total_evals == 1 with taskset unset and job_id still None, job_id_for_run stays None and is passed to EvalContext.from_task. If downstream logic expects a job id for URLs or telemetry, this diverges from the multi-task branch, which always creates an implicit_job_id.
    Suggestion: Decide whether a single-task eval should always have a job id. If yes, always generate one when job_id is None. If no, document this behavior and ensure all consumers tolerate None.

  • Title: New async HTTP call on eval start can affect latency and user perception
    Severity: low
    File: hud/eval/manager.py (around lines 56-88, 250-312)
    Description: _send_job_enter now issues an async HTTP request with a 10s timeout in the hot path for --taskset in both single- and multi-task flows. Slow or failing HUD API responses can delay eval startup and make it appear stuck, even though failures are only logged.
    Suggestion: Consider a shorter timeout or making this more fire-and-forget (e.g., background task that doesn�t block eval startup). At minimum, log a clear message or surface a small user-facing note when job-enter is slow or fails so they understand taskset linking may be missing while eval continues.

  • Title: Large task payloads when pre-creating many taskset tasks
    Severity: low
    File: hud/cli/eval.py (around lines 655-689)
    Description: For remote execution with --taskset, all tasks lacking an id are serialized and sent in one job_enter request. Large datasets could create oversized payloads and hit server limits or timeouts.
    Suggestion: Verify expected dataset sizes and backend limits. If jobs can contain many tasks, consider batching task creation, sending only minimal metadata needed to pre-create tasks, or using a more streaming-friendly path for full task definitions.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@lorenss-m lorenss-m merged commit e8e4b22 into main Jan 18, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants