-
Run AlphaFold Predictions
Submit your protein/peptide sequences to the AlphaFold Server. -
Download Results
After completion, download the results ZIP file from the server. -
Extract PAE Data
- Open any
full_data_n.jsonfile from the results. - Locate the
"pae"key and copy the entire array (including all nested brackets).
- Open any
-
Paste into Python
- Of the given Python analysis scripts, choose one as per purpose, find the line where the PAE matrix is defined, e.g.:
pae = np.array([...]) - Replace the
[...]with the array you copied from the JSON file.
- Of the given Python analysis scripts, choose one as per purpose, find the line where the PAE matrix is defined, e.g.:
-
Set Chain Lengths
- Define
chain_lengthsas a list of amino acid sequence lengths for each peptide/protein chain:chain_lengths = [length1, length2, ...]
- Define
This workflow lets you easily analyze inter-chain PAE values from AlphaFold PAE data for further inference. Interface Predicted Alignment Error, also referred to as iPAE or PAE_i, is used as a surrogate for PPI binding affinities. Thanks to Brian Coventry for showing me the n = 2 case. I extended it to cases of dimer-ligand interaction and n peptides for specific utility and generalizability, respectively.
The underlying principle is that we want to exclude the self-interaction contribution from the peptides while calculating our interface metric of PAE. Thus, we consider the non-diagonal matrix blocks in the larger matrix defining residue-residue contributions across different peptide chains. Here is an example of a three-peptide interaction, but one can similarly visualize for n = 2 or larger cases.
| A | B | C | |
|---|---|---|---|
| A | A→A | A→B | A→C |
| B | B→A | B→B | B→C |
| C | C→A | C→B | C→C |
The values corresponding to the chain_pair_pae_min key in summay_confidence_n.json would indicate the lowest PAE the model predicts for each pairwise grouping of the different sequences being modeled, to likely achieve in the best case. However, the above iPAE calculations boil down to a single, aggregate, and cumulative measure. The various metrics should be used judiciously as they serve different purposes.
iPAE or PAE values could be influenced or biased, such as by small sequence length, but final values less than 15 are usually considered. Lastly, the measures are mere approximations for binding affinities, not exact substitutes for them.