This repository contains the computational workflows and analysis code supporting the paper titled:
Evolutionary Insights into Elongation Factor G Using AlphaFold and Ancestral Analysis.
**Comput Biol Med 2025 Vol. 191 Pages 110188; Accession Number: 40222265 DOI: 10.1016/j.compbiomed.2025.110188
This project integrates structural prediction, phylogenetic analysis, and functional validation to explore EF-G conformational evolution. Specifically, it covers:
- Identification of EF-G conformations (con1 vs. con2) using AlphaFold predictions.
- Clustering of multiple sequence alignments (MSAs) based on AlphaFold-predicted structures.
- Phylogenetic reconstruction and ancestral sequence inference.
- Functional and biophysical characterization of two reconstructed ancestral EF-G proteins.
-
myenv.yml
→ computing environment. -
EF-G_folding.ipynb
→ Main notebook for folding, structural and sequence-based clustering, and visualization. -
SEQ/wt.fasta
→ Wild-type EF-G query sequence. -
SEQ/
→ will contain initial ColabFold outputs and parent.a3malignment files generated via MMseqs2. -
Cluster/
→ will Contain clustered.a3mfiles from AFCluster, with subfolders for predicted structures and analysis. -
Cluster/folding/
→ will contain PDB files generated from the clustered.a3mfiles. -
Cluster/folding/sel_pdb/
→ will contain PDB files that passed the pLDDT confidence threshold. -
Cluster/folding/sel_pdb/1st_1-286_RMSD/
→ will contain structure clustering and PSE-based selection data and further analysis. -
a3m_file→ example a3m files of con1 and con2 for WT -
WT_list1/2.txt→ example lists of a3m after structure clustering -
combine_All_MSA→ folder with .fasta files to demonstrate cell #17. -
combine_→ example lists of a3m after structure clustering -
one .pse→ example pse file to identify con1 and con2 for WT -
two .treefile and one .state file→ example treefiles and ancestor sequence file. -
one node_JSD.csv→ example to repproduce figure 5.
-
ClusterMSA_min3.py
→ Modified sequence clustering script using a DBSCAN-based method. -
utils.py
→ Supporting utility functionsthese two attibutions are adapted from:
H. Wayment-Steele et al., Nature 625, 832–839 (2024), doi:10.1038/s41586-023-06832-9. https://github.com/HWaymentSteele/AF_Cluster
- myenv.yml
- requirement.txt
- AlphaFold / ColabFold — for structure prediction
- IQ-TREE 2 — for maximum likelihood phylogenetic reconstruction
- PyMOL — for structural visualization and comparison
- Biopython, NumPy, Matplotlib — for parsing, analysis, and plotting
Most cells function as modular units, capable of carrying out specific tasks independently.
To reproduce the computing environment:
conda env create -f myenv.yml
conda activate myenv To reproduce the full analysis pipeline:
- Set your working directory (
default_path). - Start from cell #2 in
EF-G_folding.ipynb, adjusting the path to your local ColabFold installation. - Run cell #3 to generate the
Cluster/directory. - Follow through cells #2–8 to reproduce Figures 1–2.
Cells #9–16 generate Figure 3, and #18–22 produce Figure 5. - Cell #23 (standalone) uses Logomaker for Figure 4.
- Cell #17 produces Supplementary Figure S4A.
If you use this code or find it helpful, please cite:
Shawonur Rahaman,Jacob H. Steele,Yi Zeng,Shoujun Xu,Yuhong Wang (2025).
Evolutionary Insights into Elongation Factor G Using AlphaFold and Ancestral Analysis.
Comput Biol Med 2025 Vol. 191 Pages 110188; Accession Number: 40222265 DOI: 10.1016/j.compbiomed.2025.110188
For questions or collaboration inquiries, please contact:
Yuhong Wang
📧 ywang60@uh.edu