In this repository all code and datasets are available to reproduce the results described in the article Compositionally constrained sites drive long branch attraction and to run the introduced CAT-PMSF pipeline on arbitrary datasets.
Structure of the repository:
- datasets: empirical datasets analyzed in the article
- datasets/simulation: simulation dataset and scripts to generate them with ELynx
- scripts: scripts to perform data transformation and analysis at various steps of the CAT-PMSF pipeline
- step1_iqtree_lg: results of the 1st step of the CAT-PMSF pipeline applied to the empirical datasets
- step1_iqtree_lg/simulation: correct (good) and incorrect (bad) topologies of the simulations
- step2_pb: results of the 2nd step of the CAT-PMSF pipeline applied to the empirical datasets
- step2_pb/simulation: results of the 2nd step of the CAT-PMSF pipeline applied to the simulated trees
- step3_iqtree: results of the 3rd step of the CAT-PMSF pipeline applied to the empirical datasets
- step3_iqtree/simulation: results of the 3rd step of the CAT-PMSF pipeline applied to the simulated trees
- homo: results of the tests for compositional heterogeneity across lineages with Homo v2.1