This repository contains all code used for our experiments conducted with the rnaglib
task suite. For full documentation please refer to rnaglib.org and github.com/cgoliver/rnaglib.
This repository provides the necessary code to reproduce the three main experiments reported in our preprint: number of layers ablation study, splitting strategy ablation study and representation ablation study
-
Prior to running any of the three experiments detailed below, you need to create the datasets. To do so, please run
python create_datasets.py
. This will create the datasets relevant to each of the tasks and store them in a folder namedroots
-
To reproduce the number of layers ablation study, run
python run_exp_nb_layers.py
. Once this is done, JSON files containing the training data will be stored inresults
. Then, you will be able to runpython plotting_scripts/make_plot_nb_layers.py
which will create (if default parameters are kept) a file namedplotting_scripts/nb_layers_ablation_2.5D.pdf
corresponding to Figure 4a of our preprint. Please note that you can tune some parameters, for instancerepresentation
in order to visualize the impact of the number of layers when using a different representation than 2.5D. In this case, you need to change accordingly the parameters inrun_exp_nb_layers.py
and inmake_plot_nb_layers.py
-
To reproduce the splitting strategy ablation study, run
python run_exp_splitting.py
, which will train models and dump the relevant JSONs inresults
. In order to reproduce the associated plot, runpython plotting_scripts/make_plot_splitting.py
. This will create a file namedplotting_scripts/splitting_ablation.pdf
reproducing Figure 3a of our preprint. -
To reproduce the representation ablation study, run
python run_exp_representations.py
, which will train models and dump the relevant JSONs inresults
. In order to reproduce the associated plot, runpython plotting_scripts/make_plot_representation.py
. This will create a file namedplotting_scripts/representation_ablation.pdf
reproducing Figure 4b of our preprint. -
To reproduce the benchmark table , run
python run_exp_splitting.py
, which will train each model with its default splitting and 2.5D representations with its best hyperparameters. Then runpython plotting_scripts/make_table_benchmark.py
. This will create a file namedplotting_scripts/final_benchmark.pdf
reproducing Table 2 of our preprint, alongside a CSV fileplotting_scripts/final_benchmark.csv
.
Once a training has been made in specific conditions, it won't be re-run if a subsequently used script needs to run it, unless you change the retrain
parameter to True
. Therefore, running experiments which trainings partially overlap won't lead to a waste of time.
- To reproduce the timing results, run
python timing.py