Skip to content

fix: reuse system metrics in nodes on mlflow.start_run#647

Open
Calychas wants to merge 3 commits intoGalileo-Galilei:masterfrom
Calychas:nested-system-metrics-fix
Open

fix: reuse system metrics in nodes on mlflow.start_run#647
Calychas wants to merge 3 commits intoGalileo-Galilei:masterfrom
Calychas:nested-system-metrics-fix

Conversation

@Calychas
Copy link
Contributor

@Calychas Calychas commented Apr 12, 2025

Description

With the current measures to make kedro-mlflow work with thread safety in MLFlow, technically a call to mlflow.start_run() is happening on every node with the same run_id. While having logging of system metrics enabled this means, a new SystemMetricMonitor will be started for every node with the same run_id (source code). The pipeline will run slower and slower, until it stops progressing (I couldn't run a pipeline with ~200 nodes, stopped ~50)

Development notes

One line change - set parameter log_system_metrics=False in mlflow.start_run() for before_node_run hook.

Added test that checks whether the SystemMetricMonitor is the same after running a node.

Checklist

  • Read the contributing guidelines
  • Open this PR as a 'Draft Pull Request' if it is work-in-progress
  • Update the documentation to reflect the code changes
  • Add a description of this change and add your name to the list of supporting contributions in the CHANGELOG.md file. Please respect Keep a Changelog guidelines.
  • Add tests to cover your changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@Galileo-Galilei
Copy link
Owner

Thanks for the bug report PR and sorry for the late reply! Is it good to go? Can i merge it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants