added links to readme training on new datasets

jyaacoub · web-flow · commit 521abc78174f · 2025-02-17T15:58:25.000-08:00
diff --git a/README.md b/README.md
@@ -15,9 +15,9 @@ Nonetheless, all the important stuff we would need, like model checkpoints, shou
 [Training splits](https://github.com/jyaacoub/MutDTA/tree/main/splits) can be found on the GitHub page as well as all my most recent code.
 
 # Training on new datasets or new models
-1. For training on existing data you would use the train_folds.sh script, depending on your comfort with editing the existing python scripts it might be a bit difficult to set up. But you just need to define a new "model_opt" in src/utils/loader.py, and add that model key to the list of options in src/utils/config.py.
-2. If you make any changes to the input model features this would make things a lot harder since this is essentially building a new dataset with those features and would need to add instructions on how to set that up for protein features, protein edges, and ligand features.
-3. For new datasets this is more challenging since you basically need to build a new Dataset subclass (inherited from the BaseDataset class) - see PlatinumDataset for a good example on this (it is the cleanest of the 3 dataset classes I have).
+1. For training on existing data you would use the [train_folds.sh](https://github.com/jyaacoub/MutDTA/blob/main/SBATCH/train_folds.sh) script, depending on your comfort with editing the existing python scripts it might be a bit difficult to set up. But you just need to define a new "model_opt" in [src/utils/loader.py](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/utils/loader.py#L146), and add that model key to the list of options in [src/utils/config.py](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/utils/config.py#L25).
+2. If you make any changes to the input model features this would make things a lot harder since this is essentially building a new dataset with those features and would need to add instructions on how to set that up for [protein features](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/data_prep/feature_extraction/protein.py#L59), [protein edges](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/data_prep/feature_extraction/protein_edges.py#L10), and [ligand features](https://github.com/jyaacoub/MutDTA/blob/main/src/data_prep/feature_extraction/ligand.py).
+3. For entirely new datasets this is more challenging since you basically need to build a new Dataset subclass (inherited from the BaseDataset class) - see [PlatinumDataset](https://github.com/jyaacoub/MutDTA/blob/99921771e31349d8a9564be6aec9fdab35ce0ae6/src/data_prep/datasets.py#L1019) for a good example on this (it is the cleanest of the 3 dataset classes I have).
 
 # GitHub issues