The present repository contains the code used for the hyperparameter tuning (arch_LR_optimization), the database analysis using the described molecular fragmentation algorithm (added as a standalone package in "fragmentation") and the code for the trained model predictions as well as their training per each database (selected_models).
The environment used for the development of the models and all the training in the manuscript is described by the original_conda_reqs.txt. This environment was created by cloning an enviroment of the specific HPC resources used, and then proceeding to the installation of the missing libraries. The environment that was cloned contained some libraries tuned to the HPC resources, and as such setting up the environment is not easy. After multiple attempts we developed a minimal version of the environment that is much easier to install and that allows for a simpler setup. In all the tests that we carried out the models trained with the minimal version achieved similar results to the ones obtained with the original environment, being the only notable difference the time it took to complete the trainings.
To set up a minimal working environment to run the predictions or the model trainings:
First, create the environment using:
conda create -p path/to/my/env --file conda_reqs.txt
or if you prefer your conda environments by name
conda create -n my_env_name --file conda_reqs.txt
Next, activate the environment and install using pip:
python -m pip install -r pip_reqs.txt
The databases have been uploaded to Zenodo under the DOI: