-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor input handling to log inputs as artifacts for the current run. #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
src/modelplane/utils/input.py
Outdated
| pass | ||
| def log_artifact(self, current_run_id: str): | ||
| """Log the dataset to MLflow as an artifact for the given `current_run_id`.""" | ||
| mlflow.log_artifact(str(self.local_path()), run_id=current_run_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only log the local path? Can we also log the URL if it's a DVC input? Or the input is an artifact of a previous run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkorycki str(self.local_path()) is to provide it with the location of the file. The whole file is "logged" to mlflow, so it's visible/browse-able in that interface.
One thing we're losing is the details of the origin of the file, I could keep that as tags on the run maybe? Like an input tag with a nice string that represents the input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. Yes, I think it would be nice to keep track of the provenance of these data files. Maybe log_input makes most sense for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlflow.log_input is just kind of annoying to deal with. (We had to have all that weird metadata stuff.) I feel like we could be much more human-friendly with tags?
I think we could log all the input params that we use to load the file. That might be simplest. Let me mock it up in a commit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated; results in something like this:

Have a look @bkorycki
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks:)
…force MLflow active run requirement
Implements first two bullets of #68