Skip to content

fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk version to 0.1.3#133

Merged
tarilabs merged 2 commits intoopendatahub-io:incubationfrom
scheruku-rh:fixes_incubation
Mar 25, 2026
Merged

fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk version to 0.1.3#133
tarilabs merged 2 commits intoopendatahub-io:incubationfrom
scheruku-rh:fixes_incubation

Conversation

@scheruku-rh
Copy link
Copy Markdown

@scheruku-rh scheruku-rh commented Mar 24, 2026

closes: https://redhat.atlassian.net/browse/RHOAIENG-54869

Ensures the MLflow run id from callbacks.mlflow.save() is written to JobResults.mlflow_run_id before callbacks.report_results(), so Eval Hub receives mlflow_run_id on the final /events payload (requires an eval-hub-sdk version where save() returns the run id).

Description

How Has This Been Tested?

logs:

2026-03-24 15:56:54,678 - httpx - INFO - HTTP Request: POST http://localhost:8080/api/v1/evaluations/jobs/63520e7b-3568-49fa-a8b7-4a1f66687e7e/events "HTTP/1.1 204 No Content"
2026-03-24 15:56:54,687 - httpx - INFO - HTTP Request: POST http://localhost:8080/api/v1/evaluations/jobs/63520e7b-3568-49fa-a8b7-4a1f66687e7e/events "HTTP/1.1 204 No Content"
2026-03-24 15:56:54,688 - __main__ - INFO - No OCI exports configured; skipping artifact persistence
2026-03-24 15:56:54,689 - __main__ - INFO - ================================================================================
2026-03-24 15:56:54,689 - __main__ - INFO - Evaluation completed successfully
2026-03-24 15:56:54,689 - __main__ - INFO - Overall score: 0.3
2026-03-24 15:56:54,689 - __main__ - INFO - Examples evaluated: 10
2026-03-24 15:56:54,689 - __main__ - INFO - Duration: 7.55s
2026-03-24 15:56:54,689 - __main__ - INFO - ================================================================================
2026-03-24 15:56:54,704 - httpx - INFO - HTTP Request: GET http://localhost:5000/api/2.0/mlflow/experiments/get-by-name?experiment_name=test_exp_evalhub "HTTP/1.1 200 OK"
2026-03-24 15:56:54,709 - httpx - INFO - HTTP Request: POST http://localhost:5000/api/2.0/mlflow/runs/create "HTTP/1.1 200 OK"
2026-03-24 15:56:54,715 - httpx - INFO - HTTP Request: POST http://localhost:5000/api/2.0/mlflow/runs/log-batch "HTTP/1.1 200 OK"
2026-03-24 15:56:54,717 - httpx - INFO - HTTP Request: POST http://localhost:5000/api/2.0/mlflow/runs/update "HTTP/1.1 200 OK"
2026-03-24 15:56:54,717 - evalhub.adapter.callbacks - INFO - Saved to MLflow (odh) experiment 'test_exp_evalhub' (run_id: d328c5835cee40978825eb722c0fa278) — 5 metric(s), 0 artifact(s)
2026-03-24 15:56:54,717 - __main__ - INFO - save() returned: 'd328c5835cee40978825eb722c0fa278'
2026-03-24 15:56:54,717 - evalhub.adapter.callbacks - INFO - Events report_results body: {'benchmark_status_event': {'id': 'arc_easy', 'benchmark_index': 0, 'state': 'completed', 'status': 'completed', 'message': {'message': 'Evaluation completed successfully', 'message_code': 'evaluation_completed'}, 'metrics': {'acc': 0.3, 'acc_stderr': 0.15275252316519464, 'acc_norm': 0.3, 'acc_norm_stderr': 0.15275252316519464}, 'completed_at': '2026-03-24T10:26:54.678452+00:00', 'duration_seconds': 7, 'provider_id': 'lm_evaluation_harness', 'mlflow_run_id': 'd328c5835cee40978825eb722c0fa278'}}
2026-03-24 15:56:54,726 - httpx - INFO - HTTP Request: POST http://localhost:8080/api/v1/evaluations/jobs/63520e7b-3568-49fa-a8b7-4a1f66687e7e/events "HTTP/1.1 204 No Content"
2026-03-24 15:56:54,726 - evalhub.adapter.callbacks - INFO - Results reported to evalhub | Metrics: 4 | Score: 0.3
2026-03-24 15:56:54,726 - evalhub.adapter.callbacks - INFO - Job 63520e7b-3568-49fa-a8b7-4a1f66687e7e completed | Benchmark: arc_easy | Model: qwen2.5-1.5b-instruct.ggu | Score: 0.3 | Examples: 10 | Duration: 7.55s

Get on job_id now shows mlflow_run_id:

{"resource":{"id":"63520e7b-3568-49fa-a8b7-4a1f66687e7e","tenant":"team-a","created_at":"2026-03-24T10:26:32Z","updated_at":"2026-03-24T10:26:54Z"},"status":{"state":"completed","message":{"message":"Evaluation job is completed","message_code":"evaluation_job_updated"},"benchmarks":[{"provider_id":"lm_evaluation_harness","id":"arc_easy","benchmark_index":0,"status":"completed","completed_at":"2026-03-24T10:26:54.678452+00:00"}]},"results":{"benchmarks":[{"id":"arc_easy","provider_id":"lm_evaluation_harness","benchmark_index":0,"metrics":{"acc":0.3,"acc_norm":0.3,"acc_norm_stderr":0.15275252316519464,"acc_stderr":0.15275252316519464},"mlflow_run_id":"d328c5835cee40978825eb722c0fa278"}]},"name":"test","model":{"url":"http://localhost:8001/v1","name":"qwen2.5-1.5b-instruct.ggu"},"benchmarks":[{"id":"arc_easy","provider_id":"lm_evaluation_harness","parameters":{"limit":5,"num_examples":10,"tokenizer":"google/flan-t5-small"}}],"experiment":{"name":"evalhub"}}

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Bug Fixes

    • MLflow integration updated so experiment run IDs are captured and stored with results earlier in the reporting flow, ensuring results include associated run identifiers.
  • Chores

    • Bumped internal SDK dependency to the next patch version to align runtime components.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

The entrypoint in main.py now calls callbacks.mlflow.save(results, adapter.job_spec) before callbacks.report_results(results). It captures the return value as mlflow_run_id and, when present, assigns it to results.mlflow_run_id. Previously MLflow was saved after reporting and the run ID was not stored on results. Dependency pins for eval-hub-sdk[adapter] were bumped from 0.1.2 to 0.1.3 in pyproject.toml and requirements.txt.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Actionable Issues

  • Unvalidated assignment to results object (CWE-20): Validate mlflow_run_id before assigning to results.mlflow_run_id. Enforce type/format checks (non-null, non-empty, expected pattern), log and skip assignment on invalid input, or raise a clear exception.
  • Order-dependency risk (CWE-665): Confirm callbacks.report_results() does not rely on results.mlflow_run_id. If it does, either restore previous ordering or update report_results to tolerate absence/presence and document expected behavior.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: propagating MLflow run id to Eval Hub and updating the eval-hub-sdk dependency version to 0.1.3, both of which are directly reflected in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
main.py (1)

488-494: Unhandled exception from mlflow.save() will skip report_results().

If callbacks.mlflow.save() raises, the evaluation results are never reported to EvalHub. Consider wrapping the MLflow call so a transient MLflow failure doesn't discard a successful evaluation.

♻️ Suggested defensive handling
         # MLflow first; run id from save() is sent on report_results when SDK returns it.
-        mlflow_run_id = callbacks.mlflow.save(results, adapter.job_spec)
-        if mlflow_run_id:
-            results.mlflow_run_id = mlflow_run_id
+        try:
+            mlflow_run_id = callbacks.mlflow.save(results, adapter.job_spec)
+            if mlflow_run_id:
+                results.mlflow_run_id = mlflow_run_id
+        except Exception as mlflow_err:
+            logger.warning("MLflow save failed; proceeding without run id: %s", mlflow_err)

         # Report final results to EvalHub (status/results API)
         callbacks.report_results(results)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 488 - 494, Wrap the MLflow save call in a try/except so
that exceptions from callbacks.mlflow.save(results, adapter.job_spec) do not
prevent callbacks.report_results(results) from running; specifically, call
callbacks.mlflow.save inside a try, on success set results.mlflow_run_id =
mlflow_run_id, and on exception log or record the MLflow error (including
exception details) but continue to call callbacks.report_results(results) in the
normal control flow so EvalHub always receives the evaluation results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@main.py`:
- Around line 488-494: Wrap the MLflow save call in a try/except so that
exceptions from callbacks.mlflow.save(results, adapter.job_spec) do not prevent
callbacks.report_results(results) from running; specifically, call
callbacks.mlflow.save inside a try, on success set results.mlflow_run_id =
mlflow_run_id, and on exception log or record the MLflow error (including
exception details) but continue to call callbacks.report_results(results) in the
normal control flow so EvalHub always receives the evaluation results.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: d88e8645-4158-4c7f-a54b-b73e9f9d8898

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7f865 and 86fd75e.

📒 Files selected for processing (1)
  • main.py

Copy link
Copy Markdown
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me makes the mflow_run_id happens first to the callback of SDK resport_results

Copy link
Copy Markdown

@mariusdanciu mariusdanciu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@scheruku-rh scheruku-rh changed the title fix: propagate MLflow run id to Eval Hub fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk version to 0.1.3 Mar 25, 2026
Copy link
Copy Markdown
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
thanks!

@tarilabs tarilabs merged commit c8d8109 into opendatahub-io:incubation Mar 25, 2026
4 of 5 checks passed
scheruku-rh added a commit to scheruku-rh/lm-evaluation-harness that referenced this pull request Mar 25, 2026
…ion to 0.1.3 (opendatahub-io#133)

* fix: propagate MLflow run id to Eval Hub

* chore: update sdk version
tarilabs pushed a commit that referenced this pull request Mar 25, 2026
…ion to 0.1.3 (#133) (#135)

* fix: propagate MLflow run id to Eval Hub

* chore: update sdk version
ruivieira pushed a commit to ruivieira/lm-evaluation-harness that referenced this pull request Mar 26, 2026
…ion to 0.1.3 (opendatahub-io#133)

* fix: propagate MLflow run id to Eval Hub

* chore: update sdk version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants