made model checkpoints more informative#107
Merged
avantikalal merged 5 commits intomainfrom Mar 4, 2025
Merged
Conversation
Collaborator
|
Sorry if I missed it, is there a test that checks whether the genomic intervals are stored or not? |
Collaborator
Author
I will add tests. Question: do you think we should store intervals of length |
Collaborator
|
Hmm that’s tricky. Maybe |
Collaborator
Author
|
@suragnair new commit:
|
suragnair
approved these changes
Mar 4, 2025
This was referenced Mar 5, 2025
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Earlier, model checkpoints contained only the chromosomes used for training and validation. We now store the full genomic intervals.
Added an option to write a checkpoint in
LightningModel.test_on_dataset. If selected, then test dataset parameters and test set performance metrics will also be written to a checkpoint.Stored train, val and test dataset parameters as nested dictionaries under
model.data_paramsfor readability.Demonstrated these changes in Tutorial 3.
This allows much better reproducibility as all models will be distributed along with their train/val/test intervals and their per-task performance. Users can then make sure they use the model only on tasks with good performance, and don't evaluate it on regions that overlap with the train/val intervals.