Skip to content

Computational Protein Binding Affinity Calculation

Notifications You must be signed in to change notification settings

sarkarsrijon/ipae

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

How to Use AlphaFold PAE Data in Python

  1. Run AlphaFold Predictions
    Submit your protein/peptide sequences to the AlphaFold Server.
  2. Download Results
    After completion, download the results ZIP file from the server.
  3. Extract PAE Data
    • Open any full_data_n.json file from the results.
    • Locate the "pae" key and copy the entire array (including all nested brackets).
  4. Paste into Python
    • Of the given Python analysis scripts, choose one as per purpose, find the line where the PAE matrix is defined, e.g.:
      pae = np.array([...])
    • Replace the [...] with the array you copied from the JSON file.
  5. Set Chain Lengths
    • Define chain_lengths as a list of amino acid sequence lengths for each peptide/protein chain:
      chain_lengths = [length1, length2, ...]

This workflow lets you easily analyze inter-chain PAE values from AlphaFold PAE data for further inference. Interface Predicted Alignment Error, also referred to as iPAE or PAE_i, is used as a surrogate for PPI binding affinities. Thanks to Brian Coventry for showing me the n = 2 case. I extended it to cases of dimer-ligand interaction and n peptides for specific utility and generalizability, respectively.

The underlying principle is that we want to exclude the self-interaction contribution from the peptides while calculating our interface metric of PAE. Thus, we consider the non-diagonal matrix blocks in the larger matrix defining residue-residue contributions across different peptide chains. Here is an example of a three-peptide interaction, but one can similarly visualize for n = 2 or larger cases.

A B C
A A→A A→B A→C
B B→A B→B B→C
C C→A C→B C→C

The values corresponding to the chain_pair_pae_min key in summay_confidence_n.json would indicate the lowest PAE the model predicts for each pairwise grouping of the different sequences being modeled, to likely achieve in the best case. However, the above iPAE calculations boil down to a single, aggregate, and cumulative measure. The various metrics should be used judiciously as they serve different purposes.

iPAE or PAE values could be influenced or biased, such as by small sequence length, but final values less than 15 are usually considered. Lastly, the measures are mere approximations for binding affinities, not exact substitutes for them.

About

Computational Protein Binding Affinity Calculation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages