fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk version to 0.1.3 by scheruku-rh · Pull Request #133 · opendatahub-io/lm-evaluation-harness

scheruku-rh · 2026-03-24T10:40:15Z

closes: https://redhat.atlassian.net/browse/RHOAIENG-54869

Ensures the MLflow run id from callbacks.mlflow.save() is written to JobResults.mlflow_run_id before callbacks.report_results(), so Eval Hub receives mlflow_run_id on the final /events payload (requires an eval-hub-sdk version where save() returns the run id).

Description

How Has This Been Tested?

logs:

2026-03-24 15:56:54,678 - httpx - INFO - HTTP Request: POST http://localhost:8080/api/v1/evaluations/jobs/63520e7b-3568-49fa-a8b7-4a1f66687e7e/events "HTTP/1.1 204 No Content"
2026-03-24 15:56:54,687 - httpx - INFO - HTTP Request: POST http://localhost:8080/api/v1/evaluations/jobs/63520e7b-3568-49fa-a8b7-4a1f66687e7e/events "HTTP/1.1 204 No Content"
2026-03-24 15:56:54,688 - __main__ - INFO - No OCI exports configured; skipping artifact persistence
2026-03-24 15:56:54,689 - __main__ - INFO - ================================================================================
2026-03-24 15:56:54,689 - __main__ - INFO - Evaluation completed successfully
2026-03-24 15:56:54,689 - __main__ - INFO - Overall score: 0.3
2026-03-24 15:56:54,689 - __main__ - INFO - Examples evaluated: 10
2026-03-24 15:56:54,689 - __main__ - INFO - Duration: 7.55s
2026-03-24 15:56:54,689 - __main__ - INFO - ================================================================================
2026-03-24 15:56:54,704 - httpx - INFO - HTTP Request: GET http://localhost:5000/api/2.0/mlflow/experiments/get-by-name?experiment_name=test_exp_evalhub "HTTP/1.1 200 OK"
2026-03-24 15:56:54,709 - httpx - INFO - HTTP Request: POST http://localhost:5000/api/2.0/mlflow/runs/create "HTTP/1.1 200 OK"
2026-03-24 15:56:54,715 - httpx - INFO - HTTP Request: POST http://localhost:5000/api/2.0/mlflow/runs/log-batch "HTTP/1.1 200 OK"
2026-03-24 15:56:54,717 - httpx - INFO - HTTP Request: POST http://localhost:5000/api/2.0/mlflow/runs/update "HTTP/1.1 200 OK"
2026-03-24 15:56:54,717 - evalhub.adapter.callbacks - INFO - Saved to MLflow (odh) experiment 'test_exp_evalhub' (run_id: d328c5835cee40978825eb722c0fa278) — 5 metric(s), 0 artifact(s)
2026-03-24 15:56:54,717 - __main__ - INFO - save() returned: 'd328c5835cee40978825eb722c0fa278'
2026-03-24 15:56:54,717 - evalhub.adapter.callbacks - INFO - Events report_results body: {'benchmark_status_event': {'id': 'arc_easy', 'benchmark_index': 0, 'state': 'completed', 'status': 'completed', 'message': {'message': 'Evaluation completed successfully', 'message_code': 'evaluation_completed'}, 'metrics': {'acc': 0.3, 'acc_stderr': 0.15275252316519464, 'acc_norm': 0.3, 'acc_norm_stderr': 0.15275252316519464}, 'completed_at': '2026-03-24T10:26:54.678452+00:00', 'duration_seconds': 7, 'provider_id': 'lm_evaluation_harness', 'mlflow_run_id': 'd328c5835cee40978825eb722c0fa278'}}
2026-03-24 15:56:54,726 - httpx - INFO - HTTP Request: POST http://localhost:8080/api/v1/evaluations/jobs/63520e7b-3568-49fa-a8b7-4a1f66687e7e/events "HTTP/1.1 204 No Content"
2026-03-24 15:56:54,726 - evalhub.adapter.callbacks - INFO - Results reported to evalhub | Metrics: 4 | Score: 0.3
2026-03-24 15:56:54,726 - evalhub.adapter.callbacks - INFO - Job 63520e7b-3568-49fa-a8b7-4a1f66687e7e completed | Benchmark: arc_easy | Model: qwen2.5-1.5b-instruct.ggu | Score: 0.3 | Examples: 10 | Duration: 7.55s

Get on job_id now shows mlflow_run_id:

{"resource":{"id":"63520e7b-3568-49fa-a8b7-4a1f66687e7e","tenant":"team-a","created_at":"2026-03-24T10:26:32Z","updated_at":"2026-03-24T10:26:54Z"},"status":{"state":"completed","message":{"message":"Evaluation job is completed","message_code":"evaluation_job_updated"},"benchmarks":[{"provider_id":"lm_evaluation_harness","id":"arc_easy","benchmark_index":0,"status":"completed","completed_at":"2026-03-24T10:26:54.678452+00:00"}]},"results":{"benchmarks":[{"id":"arc_easy","provider_id":"lm_evaluation_harness","benchmark_index":0,"metrics":{"acc":0.3,"acc_norm":0.3,"acc_norm_stderr":0.15275252316519464,"acc_stderr":0.15275252316519464},"mlflow_run_id":"d328c5835cee40978825eb722c0fa278"}]},"name":"test","model":{"url":"http://localhost:8001/v1","name":"qwen2.5-1.5b-instruct.ggu"},"benchmarks":[{"id":"arc_easy","provider_id":"lm_evaluation_harness","parameters":{"limit":5,"num_examples":10,"tokenizer":"google/flan-t5-small"}}],"experiment":{"name":"evalhub"}}

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

Bug Fixes
- MLflow integration updated so experiment run IDs are captured and stored with results earlier in the reporting flow, ensuring results include associated run identifiers.
Chores
- Bumped internal SDK dependency to the next patch version to align runtime components.

coderabbitai · 2026-03-24T10:40:30Z

📝 Walkthrough

Walkthrough

The entrypoint in main.py now calls callbacks.mlflow.save(results, adapter.job_spec) before callbacks.report_results(results). It captures the return value as mlflow_run_id and, when present, assigns it to results.mlflow_run_id. Previously MLflow was saved after reporting and the run ID was not stored on results. Dependency pins for eval-hub-sdk[adapter] were bumped from 0.1.2 to 0.1.3 in pyproject.toml and requirements.txt.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Actionable Issues

Unvalidated assignment to results object (CWE-20): Validate mlflow_run_id before assigning to results.mlflow_run_id. Enforce type/format checks (non-null, non-empty, expected pattern), log and skip assignment on invalid input, or raise a clear exception.
Order-dependency risk (CWE-665): Confirm callbacks.report_results() does not rely on results.mlflow_run_id. If it does, either restore previous ordering or update report_results to tolerate absence/presence and document expected behavior.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: propagating MLflow run id to Eval Hub and updating the eval-hub-sdk dependency version to 0.1.3, both of which are directly reflected in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

main.py (1)

488-494: Unhandled exception from mlflow.save() will skip report_results().

If callbacks.mlflow.save() raises, the evaluation results are never reported to EvalHub. Consider wrapping the MLflow call so a transient MLflow failure doesn't discard a successful evaluation.

♻️ Suggested defensive handling

         # MLflow first; run id from save() is sent on report_results when SDK returns it.
-        mlflow_run_id = callbacks.mlflow.save(results, adapter.job_spec)
-        if mlflow_run_id:
-            results.mlflow_run_id = mlflow_run_id
+        try:
+            mlflow_run_id = callbacks.mlflow.save(results, adapter.job_spec)
+            if mlflow_run_id:
+                results.mlflow_run_id = mlflow_run_id
+        except Exception as mlflow_err:
+            logger.warning("MLflow save failed; proceeding without run id: %s", mlflow_err)

         # Report final results to EvalHub (status/results API)
         callbacks.report_results(results)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 488 - 494, Wrap the MLflow save call in a try/except so
that exceptions from callbacks.mlflow.save(results, adapter.job_spec) do not
prevent callbacks.report_results(results) from running; specifically, call
callbacks.mlflow.save inside a try, on success set results.mlflow_run_id =
mlflow_run_id, and on exception log or record the MLflow error (including
exception details) but continue to call callbacks.report_results(results) in the
normal control flow so EvalHub always receives the evaluation results.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@main.py`:
- Around line 488-494: Wrap the MLflow save call in a try/except so that
exceptions from callbacks.mlflow.save(results, adapter.job_spec) do not prevent
callbacks.report_results(results) from running; specifically, call
callbacks.mlflow.save inside a try, on success set results.mlflow_run_id =
mlflow_run_id, and on exception log or record the MLflow error (including
exception details) but continue to call callbacks.report_results(results) in the
normal control flow so EvalHub always receives the evaluation results.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: d88e8645-4158-4c7f-a54b-b73e9f9d8898

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7f865 and 86fd75e.

📒 Files selected for processing (1)

main.py

tarilabs

LGTM to me makes the mflow_run_id happens first to the callback of SDK resport_results

mariusdanciu

lgtm

tarilabs

LGTM
thanks!

…ion to 0.1.3 (opendatahub-io#133) * fix: propagate MLflow run id to Eval Hub * chore: update sdk version

…ion to 0.1.3 (#133) (#135) * fix: propagate MLflow run id to Eval Hub * chore: update sdk version

…ion to 0.1.3 (opendatahub-io#133) * fix: propagate MLflow run id to Eval Hub * chore: update sdk version

fix: propagate MLflow run id to Eval Hub

86fd75e

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

tarilabs approved these changes Mar 24, 2026

View reviewed changes

mariusdanciu approved these changes Mar 24, 2026

View reviewed changes

chore: update sdk version

0deafe1

scheruku-rh force-pushed the fixes_incubation branch from ebe9519 to 0deafe1 Compare March 25, 2026 06:37

scheruku-rh changed the title ~~fix: propagate MLflow run id to Eval Hub~~ fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk version to 0.1.3 Mar 25, 2026

tarilabs approved these changes Mar 25, 2026

View reviewed changes

tarilabs merged commit c8d8109 into opendatahub-io:incubation Mar 25, 2026
4 of 5 checks passed

tarilabs pushed a commit that referenced this pull request Mar 25, 2026

fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk vers…

7e0c94c

…ion to 0.1.3 (#133) (#135) * fix: propagate MLflow run id to Eval Hub * chore: update sdk version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk version to 0.1.3#133

fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk version to 0.1.3#133
tarilabs merged 2 commits intoopendatahub-io:incubationfrom
scheruku-rh:fixes_incubation

scheruku-rh commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Actionable Issues

Uh oh!

coderabbitai bot left a comment

Uh oh!

tarilabs left a comment

Uh oh!

mariusdanciu left a comment

Uh oh!

tarilabs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

scheruku-rh commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Actionable Issues

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tarilabs left a comment

Choose a reason for hiding this comment

Uh oh!

mariusdanciu left a comment

Choose a reason for hiding this comment

Uh oh!

tarilabs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scheruku-rh commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading