Skip to content

Commit 521abc7

Browse files
authored
added links to readme training on new datasets
1 parent 3161641 commit 521abc7

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ Nonetheless, all the important stuff we would need, like model checkpoints, shou
1515
[Training splits](https://github.com/jyaacoub/MutDTA/tree/main/splits) can be found on the GitHub page as well as all my most recent code.
1616

1717
# Training on new datasets or new models
18-
1. For training on existing data you would use the train_folds.sh script, depending on your comfort with editing the existing python scripts it might be a bit difficult to set up. But you just need to define a new "model_opt" in src/utils/loader.py, and add that model key to the list of options in src/utils/config.py.
19-
2. If you make any changes to the input model features this would make things a lot harder since this is essentially building a new dataset with those features and would need to add instructions on how to set that up for protein features, protein edges, and ligand features.
20-
3. For new datasets this is more challenging since you basically need to build a new Dataset subclass (inherited from the BaseDataset class) - see PlatinumDataset for a good example on this (it is the cleanest of the 3 dataset classes I have).
18+
1. For training on existing data you would use the [train_folds.sh](https://github.com/jyaacoub/MutDTA/blob/main/SBATCH/train_folds.sh) script, depending on your comfort with editing the existing python scripts it might be a bit difficult to set up. But you just need to define a new "model_opt" in [src/utils/loader.py](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/utils/loader.py#L146), and add that model key to the list of options in [src/utils/config.py](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/utils/config.py#L25).
19+
2. If you make any changes to the input model features this would make things a lot harder since this is essentially building a new dataset with those features and would need to add instructions on how to set that up for [protein features](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/data_prep/feature_extraction/protein.py#L59)[protein edges](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/data_prep/feature_extraction/protein_edges.py#L10), and [ligand features](https://github.com/jyaacoub/MutDTA/blob/main/src/data_prep/feature_extraction/ligand.py).
20+
3. For entirely new datasets this is more challenging since you basically need to build a new Dataset subclass (inherited from the BaseDataset class) - see [PlatinumDataset](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/data_prep/datasets.py#L1019) for a good example on this (it is the cleanest of the 3 dataset classes I have).
2121

2222
# GitHub issues
2323

0 commit comments

Comments
 (0)