fix: propagate MLflow run id to Eval Hub and update eval-hub-sdk version to 0.1.3#133
Conversation
📝 WalkthroughWalkthroughThe entrypoint in main.py now calls Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Actionable Issues
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
main.py (1)
488-494: Unhandled exception frommlflow.save()will skipreport_results().If
callbacks.mlflow.save()raises, the evaluation results are never reported to EvalHub. Consider wrapping the MLflow call so a transient MLflow failure doesn't discard a successful evaluation.♻️ Suggested defensive handling
# MLflow first; run id from save() is sent on report_results when SDK returns it. - mlflow_run_id = callbacks.mlflow.save(results, adapter.job_spec) - if mlflow_run_id: - results.mlflow_run_id = mlflow_run_id + try: + mlflow_run_id = callbacks.mlflow.save(results, adapter.job_spec) + if mlflow_run_id: + results.mlflow_run_id = mlflow_run_id + except Exception as mlflow_err: + logger.warning("MLflow save failed; proceeding without run id: %s", mlflow_err) # Report final results to EvalHub (status/results API) callbacks.report_results(results)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main.py` around lines 488 - 494, Wrap the MLflow save call in a try/except so that exceptions from callbacks.mlflow.save(results, adapter.job_spec) do not prevent callbacks.report_results(results) from running; specifically, call callbacks.mlflow.save inside a try, on success set results.mlflow_run_id = mlflow_run_id, and on exception log or record the MLflow error (including exception details) but continue to call callbacks.report_results(results) in the normal control flow so EvalHub always receives the evaluation results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@main.py`:
- Around line 488-494: Wrap the MLflow save call in a try/except so that
exceptions from callbacks.mlflow.save(results, adapter.job_spec) do not prevent
callbacks.report_results(results) from running; specifically, call
callbacks.mlflow.save inside a try, on success set results.mlflow_run_id =
mlflow_run_id, and on exception log or record the MLflow error (including
exception details) but continue to call callbacks.report_results(results) in the
normal control flow so EvalHub always receives the evaluation results.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: d88e8645-4158-4c7f-a54b-b73e9f9d8898
📒 Files selected for processing (1)
main.py
tarilabs
left a comment
There was a problem hiding this comment.
LGTM to me makes the mflow_run_id happens first to the callback of SDK resport_results
ebe9519 to
0deafe1
Compare
…ion to 0.1.3 (opendatahub-io#133) * fix: propagate MLflow run id to Eval Hub * chore: update sdk version
…ion to 0.1.3 (opendatahub-io#133) * fix: propagate MLflow run id to Eval Hub * chore: update sdk version
closes: https://redhat.atlassian.net/browse/RHOAIENG-54869
Ensures the MLflow run id from callbacks.mlflow.save() is written to JobResults.mlflow_run_id before callbacks.report_results(), so Eval Hub receives mlflow_run_id on the final /events payload (requires an eval-hub-sdk version where save() returns the run id).
Description
How Has This Been Tested?
logs:
Get on job_id now shows mlflow_run_id:
Merge criteria:
Summary by CodeRabbit
Bug Fixes
Chores