Skip to content

A C, C++, Python project focusing on Docking analysis, Source code, Blogs, Data availability, References.

Notifications You must be signed in to change notification settings

chenxingqiang/alphafold-notebooks

Repository files navigation

logo

A reference of 'AlphaFold2 Codec' include everything of AlphaFold2.

proteins


Learning Source Availability

Papers

PPT

  • My Public talk on Alphafold2 Paper Reading By Xingqiang,Chen .Key/.pptx in AF2-PPT file.
  • Sergey Ovchinnikov talk on AF2 slides /.pptx in AF2-PPT file.

Learning by Code

πŸ““ AlphaFold2 Algorithm Notebooks (32 Complete!)

We provide 32 Jupyter Notebooks covering every algorithm from the AlphaFold2 supplementary materials. Each notebook includes:

  • Algorithm pseudocode/image reference
  • Source code location mapping
  • NumPy implementation
  • Executable test cases with verification

πŸ‘‰ Full Algorithm Index

Quick Links by Category

Category Algorithms Notebooks
Data Preprocessing MSA Block Deletion Algorithm 1
Embedding Input Embedder, relpos, one_hot Alg 3, Alg 4, Alg 5
Evoformer Stack, MSA Attention, Triangle Ops Alg 6-15
Templates Pair Stack, Pointwise Attention Alg 16, Alg 17
Extra MSA Stack, Global Attention Alg 18, Alg 19
Structure Module IPA, Backbone, Atom Coords Alg 20-25
Losses FAPE, Torsion, pLDDT Alg 26-29
Recycling Inference, Training, Embedder Alg 30, Alg 31, Alg 32
Main Pipeline Full Inference Algorithm 2
πŸ“‹ Complete Algorithm List (Click to Expand)
# Algorithm Notebook Link
1 MSA Block Deletion algorithm-1-MSABlockDeletion.ipynb
2 Inference algorithm-2-Inference.ipynb
3 Input Embedder algorithm-3-InputEmbedder.ipynb
4 relpos algorithm-4-relpos.ipynb
5 one_hot algorithm-5-one_hot.ipynb
6 Evoformer Stack algorithm-6-EvoformerStack.ipynb
7 MSA Row Attention with Pair Bias algorithm-7-MSARowAttentionWithPairBias.ipynb
8 MSA Column Attention algorithm-8-MSAColumnAttention.ipynb
9 MSA Transition algorithm-9-MSATransition.ipynb
10 Outer Product Mean algorithm-10-OuterProductMean.ipynb
11 Triangle Multiplication (Outgoing) algorithm-11-TriangleMultiplicationOutgoing.ipynb
12 Triangle Multiplication (Incoming) algorithm-12-TriangleMultiplicationIncoming.ipynb
13 Triangle Attention (Starting Node) algorithm-13-TriangleAttentionStartingNode.ipynb
14 Triangle Attention (Ending Node) algorithm-14-TriangleAttentionEndingNode.ipynb
15 Pair Transition algorithm-15-PairTransition.ipynb
16 Template Pair Stack algorithm-16-TemplatePairStack.ipynb
17 Template Pointwise Attention algorithm-17-TemplatePointwiseAttention.ipynb
18 Extra MSA Stack algorithm-18-ExtraMsaStack.ipynb
19 MSA Column Global Attention algorithm-19-MSAColumnGlobalAttention.ipynb
20 Structure Module algorithm-20-StructureModule.ipynb
21 Rigid from 3 Points algorithm-21-rigidFrom3Points.ipynb
22 Invariant Point Attention algorithm-22-InvariantPointAttention.ipynb
23 Backbone Update algorithm-23-BackboneUpdate.ipynb
24 Compute All Atom Coordinates algorithm-24-computeAllAtomCoordinates.ipynb
25 makeRotX algorithm-25-makeRotX.ipynb
26 Rename Symmetric Ground Truth Atoms algorithm-26-renameSymmetricGroundTruthAtoms.ipynb
27 Torsion Angle Loss algorithm-27-torsionAngleLoss.ipynb
28 Compute FAPE algorithm-28-computeFAPE.ipynb
29 Predict Per-Residue LDDT algorithm-29-predictPerResidueLDDT.ipynb
30 Recycling (Inference) algorithm-30-RecyclingInference.ipynb
31 Recycling (Training) algorithm-31-RecyclingTraining.ipynb
32 Recycling Embedder algorithm-32-RecyclingEmbedder.ipynb

πŸ““ AlphaFold3 Algorithm Notebooks (NEW!)

We now include AlphaFold3 algorithm notebooks! AF3 introduces significant architectural changes including diffusion-based structure prediction.

πŸ‘‰ AlphaFold3 Algorithm Index

Key AF3 Components

Category Key Algorithms Notebooks
Input MSA Features, Templates, Atom Features Alg 1-4
MSA Module Outer Product, MSA Attention Alg 5-7
Pairformer Triangle Ops, Single Attention Alg 8-14
Diffusion Diffusion Module, AdaLN, Transformer Alg 15, Alg 16
Confidence Distogram, Confidence, LDDT Alg 20-23

AF3 Source Code Submodules

# Official AlphaFold3
AF3-Ref-src/alphafold3-official/

# PyTorch Implementation (lucidrains)
AF3-Ref-src/alphafold3-pytorch/

# Architecture Walkthrough
AF3-Ref-src/alphafold3-walkthrough/

πŸ““ Boltz Algorithm Notebooks (NEW!)

We now include Boltz algorithm notebooks! Boltz is a family of models for biomolecular interaction prediction:

  • Boltz-1: First fully open source model to approach AlphaFold3 accuracy
  • Boltz-2: Adds binding affinity prediction, approaching FEP accuracy 1000x faster

πŸ‘‰ Boltz Algorithm Index

Key Boltz Components

Category Key Algorithms Notebooks
Input Processing Input Embedder, Atom Encoder, RelPos Alg 1-3
MSA Module MSA Module, Outer Product, Pair Averaging Alg 4-6
Pairformer Pairformer, Triangle Ops, Attention Alg 7-11
Diffusion Diffusion Module, Transformer, Fourier Alg 12-15
Confidence & Affinity Confidence, Distogram, Affinity (Boltz-2) Alg 16-18
Loss Functions Diffusion Loss, Confidence Loss Alg 19-20

Boltz Source Code Submodule

# Official Boltz Repository
Boltz-Ref-src/boltz-official/

Papers:

πŸ““ Boltz-2 Specific Notebooks (NEW!)

Boltz-2 introduces binding affinity prediction - the first DL model approaching FEP accuracy while being 1000x faster.

πŸ‘‰ Boltz-2 Algorithm Index

Boltz-2 New Features

Category Key Algorithms Notebooks
Affinity Prediction Affinity Module, Gaussian Smearing Alg 1-2
Contact Guidance Contact Conditioning Alg 3
Enhanced v2 Modules Input v2, Template v2, Diffusion v2 Alg 5-7
Improved Confidence Confidence v2, B-Factor Alg 8, 10

Boltz-2 Submodules

# Official Repository (contains both Boltz-1 and Boltz-2)
Boltz-Ref-src/boltz-official/

# Boltzina - Virtual Screening with Boltz-2
Boltz-Ref-src/boltzina/

Practice on Modeling Test of AF2

MD+Alphafold2

Blogs

References

reference papers

πŸ“¦ AlphaFold2 Reference Source Code (Submodules)

# Official AlphaFold (DeepMind)
AF2-Ref-src/alphafold-official/

# OpenFold (PyTorch implementation)
AF2-Ref-src/openfold/

# ColabFold (Colab-friendly version)
AF2-Ref-src/colabfold/

# MMseqs2 (Sequence search)
AF2-Ref-src/mmseqs2/

# HH-suite (Template search)
AF2-Ref-src/hh-suite/

# trRosetta2 (Predecessor model)
AF2-Ref-src/trRosetta2/

# ESM (Facebook protein language model)
AF2-Ref-src/esm/

# UniRep (Protein representations)
AF2-Ref-src/unirep/

# SeqVec (Sequence embeddings)
AF2-Ref-src/seqvec/

To initialize submodules after cloning:

git submodule update --init --recursive

Data availability

All input data are freely available from public sources.

Structures from the PDB were used for training and as templates (https://www.wwpdb.org/ftp/pdb-ftp-sites; for the associated sequence data and 40% sequence clustering see also https://ftp.wwpdb.org/pub/pdb/derived_data/ and https://cdn.rcsb.org/resources/sequence/clusters/bc-40.out).

Training used a version of the PDB downloaded 28/08/2019, while CASP14 template search used a version downloaded 14/05/2020. Template search also used the PDB70 data- base, downloaded 13/05/2020 (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).

We show experimental structures from the PDB with accessions 6Y4F76, 6YJ177, 6VR478, 6SK079, 6FES80, 6W6W81, 6T1Z82, and 7JTL83.

For MSA lookup at both training and prediction time,

we used UniRef90 v2020_01 (https://ftp.ebi.ac.uk/pub/databases/uniprot/previous_releases/release-2020_01/uniref/),

BFD (https://bfd.mmseqs.com), Uniclust30 v2018_08 (https://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/),

and MGnify clusters v2018_12 (https://ftp.ebi.ac.uk/pub/databases/metagenomics/peptide_database/2018_12/). Uniclust30 v2018_08 was further used as input for constructing a distillation structure dataset.

Code and programmings availability

Source code

for the AlphaFold model, trained weights, and an inference script is available under an open-source license at https://github.com/deepmind/alphafold.

Neural networks

Neural networks were developed with

MSA search

For MSA search on

  • UniRef90, MGnify clusters, and reduced BFD we used jackhmmer and for template search on the PDB SEQRES we used
  • hmmsearch, both from HMMER v3.3 (http://eddylab.org/soft-ware/hmmer/).

For template search against PDB70, we used HHsearch from HH-suite v3.0-beta.3 14/07/2017 (https://github.com/soedinglab/hh-suite). For constrained relaxation of structures, we used OpenMM v7.3.1 (https://github.com/openmm/openmm) with the Amber99sb force field.

Docking analysis

Docking analysis on DGAT used

Data analysis

Data analysis used

Structure analysis

Structure analysis used Pymol v2.3.0 (https://github.com/schrodinger/pymol-open-source).

About

A C, C++, Python project focusing on Docking analysis, Source code, Blogs, Data availability, References.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •