Skip to content

made model checkpoints more informative#107

Merged
avantikalal merged 5 commits intomainfrom
store-intervals
Mar 4, 2025
Merged

made model checkpoints more informative#107
avantikalal merged 5 commits intomainfrom
store-intervals

Conversation

@avantikalal
Copy link
Collaborator

@avantikalal avantikalal commented Feb 26, 2025

  1. Earlier, model checkpoints contained only the chromosomes used for training and validation. We now store the full genomic intervals.

  2. Added an option to write a checkpoint in LightningModel.test_on_dataset. If selected, then test dataset parameters and test set performance metrics will also be written to a checkpoint.

  3. Stored train, val and test dataset parameters as nested dictionaries under model.data_params for readability.

  4. Demonstrated these changes in Tutorial 3.

This allows much better reproducibility as all models will be distributed along with their train/val/test intervals and their per-task performance. Users can then make sure they use the model only on tasks with good performance, and don't evaluate it on regions that overlap with the train/val intervals.

@suragnair
Copy link
Collaborator

Sorry if I missed it, is there a test that checks whether the genomic intervals are stored or not?

@avantikalal
Copy link
Collaborator Author

Sorry if I missed it, is there a test that checks whether the genomic intervals are stored or not?

I will add tests. Question: do you think we should store intervals of length seq_len, or padded_seq_len (including padding to allow for jitter etc)?

@suragnair
Copy link
Collaborator

Hmm that’s tricky. Maybe seq_len and then just store the scalar padded_seq_len?

@avantikalal
Copy link
Collaborator Author

@suragnair new commit:

  • updated the dataset classes so that self.intervals will have length seq_len
  • added tests in test_dataset.py to check that the intervals are correctly stored in the dataset
  • added tests in test_lightning.py to check that the intervals are correctly copied to the model and val and test performance is also stored in the model.

@avantikalal avantikalal merged commit b5bd01a into main Mar 4, 2025
2 checks passed
@avantikalal avantikalal deleted the store-intervals branch March 4, 2025 18:45
@avantikalal avantikalal restored the store-intervals branch March 5, 2025 08:06
@avantikalal avantikalal deleted the store-intervals branch March 5, 2025 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants