Skip to content

Conversation

@superdosh
Copy link
Contributor

Implements first two bullets of #68

@github-actions
Copy link

github-actions bot commented Aug 22, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@superdosh superdosh marked this pull request as ready for review August 22, 2025 19:43
@superdosh superdosh requested a review from a team as a code owner August 22, 2025 19:43
@superdosh superdosh requested a review from bkorycki August 22, 2025 19:44
pass
def log_artifact(self, current_run_id: str):
"""Log the dataset to MLflow as an artifact for the given `current_run_id`."""
mlflow.log_artifact(str(self.local_path()), run_id=current_run_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only log the local path? Can we also log the URL if it's a DVC input? Or the input is an artifact of a previous run?

Copy link
Contributor Author

@superdosh superdosh Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkorycki str(self.local_path()) is to provide it with the location of the file. The whole file is "logged" to mlflow, so it's visible/browse-able in that interface.

One thing we're losing is the details of the origin of the file, I could keep that as tags on the run maybe? Like an input tag with a nice string that represents the input?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it. Yes, I think it would be nice to keep track of the provenance of these data files. Maybe log_input makes most sense for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlflow.log_input is just kind of annoying to deal with. (We had to have all that weird metadata stuff.) I feel like we could be much more human-friendly with tags?

I think we could log all the input params that we use to load the file. That might be simplest. Let me mock it up in a commit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated; results in something like this:
image
Have a look @bkorycki

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks:)

@superdosh superdosh requested a review from bkorycki August 27, 2025 17:45
@superdosh superdosh merged commit dc5400f into main Aug 27, 2025
3 checks passed
@superdosh superdosh deleted the data-artifact-mgmt branch August 27, 2025 19:05
@github-actions github-actions bot locked and limited conversation to collaborators Aug 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants