Skip to content

Commit ccf738a

Browse files
Bonus on interface energetics with prot-on, haddock3-score and alascan (#706)
* bonus on interface energetics with prot-on, haddock3-score and alascan * Update index.md * add final image * upd link to zip file * final round of correcting typos --------- Co-authored-by: Alexandre Bonvin <[email protected]>
1 parent 776c36e commit ccf738a

File tree

4 files changed

+307
-11
lines changed

4 files changed

+307
-11
lines changed
410 KB
Loading

education/HADDOCK3/HADDOCK3-antibody-antigen/index.md

Lines changed: 307 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ __If running as part of a BioExcel workshop or summerschool see the instructions
185185
_Note_ that you can also download and unzip this archive directly from the Linux command line:
186186

187187
<a class="prompt prompt-cmd">
188-
wget https://surfdrive.surf.nl/files/index.php/s/R7VHGQM9nx8QuQn/download -O HADDOCK3-antibody-antigen.zip<br>
188+
wget https://surfdrive.surf.nl/files/index.php/s/MuC8YogNPj9Ac31/download -O HADDOCK3-antibody-antigen.zip<br>
189189
unzip HADDOCK3-antibody-antigen.zip
190190
</a>
191191

@@ -932,7 +932,7 @@ In the above workflow we see in three modules a *tolerance* parameter defined. U
932932

933933
*__Note__* that, in contrast to HADDOCK2.X, we have much more flexibility in defining our workflow.
934934
As an example, we could use this flexibility by introducing a clustering step after the initial rigid-body docking stage, selecting a given number of models per cluster and refining all of those.
935-
For an example of this strategy see the BONUS 3 section about ensemble docking.
935+
For an example of this strategy see the 4 section about ensemble docking.
936936

937937

938938
<hr>
@@ -1251,10 +1251,11 @@ In this execution mode the HADDOCK3 job should be submitted to the batch system
12511251
## Analysis of docking results
12521252

12531253
In case something went wrong with the docking (or simply if you do not want to wait for the results) you can find the following precalculated runs in the `runs` directory:
1254-
- `run1`: run created using the unbound antibody.
1255-
- `run1-af2`: run created using the Alphafold-multimer antibody (see BONUS 2).
1256-
- `run1-abb`: run created using the Immunebuilder antibody (see BONUS 2).
1257-
- `run1-ens`: run created using an ensemble of antibody models (see BONUS 3).
1254+
- `run1`: docking run created using the unbound antibody.
1255+
- `run1-af2`: docking run created using the Alphafold-multimer antibody (see 3).
1256+
- `run1-abb`: docking run created using the Immunebuilder antibody (see 3).
1257+
- `run1-ens`: docking run created using an ensemble of antibody models (see 4).
1258+
- `run-scoring`: scoring run created using various models obtained at the previous stages (see 6).
12581259

12591260

12601261
Once your run has completed - inspect the content of the resulting directory.
@@ -1613,7 +1614,302 @@ HADDOCK3's intrinsic flexibility can be used to improve the performance of antib
16131614
<hr>
16141615
<hr>
16151616

1616-
## BONUS 1: Does the antibody bind to a known interface of interleukin? ARCTIC-3D analysis
1617+
## BONUS 1: Dissecting the interface energetics: what is the impact of a single mutation?
1618+
1619+
Mutations at the binding interfaces can have widely varying effects on binding affinity - some may be negligible, while others can significantly strengthen or weaken the interaction. Exploring these mutations helps identify critical amino acids for redesigning structurally characterized protein-protein interfaces, which paves the way for developing protein-based therapeutics to deal with a diverse range of diseases.
1620+
To pinpoint such amino acids positions, the residues across the protein interaction surfaces are either randomly or strategically mutated. Scanning mutations in this manner is experimentally costly. Therefore, computational methods have been developed to estimate the impact of an interfacial mutation on protein-protein interactions.
1621+
These computational methods come in two main flavours. One involves rigorous free energy calculations, and, while highly accurate, these methods are computationally expensive. The other category includes faster, approximate approaches that predict changes in binding energy using statistical potentials, machine learning, empirical scoring functions etc. Though less precise, these faster methods are practical for large-scale screening and early-stage analysis. In this bonus exercise, we will take a look at two quick ways of estimating the effect of a single mutation in the interface.
1622+
1623+
1624+
### PROT-ON and haddock3-scoring to inspect a single mutation
1625+
1626+
PROT-ON (Structure-based detection of designer mutations in PROTein-protein interface mutatiONs) is a tool and [online server](http://proton.tools.ibg.edu.tr:8001/about) that scans all possible interfacial mutations and **predicts ΔΔG score** by using EvoEF1 (active in both on the web server and stand-alone versions) or FoldX (active only in the stand-alone version) with the aim of finding the most mutable positions. The original publication describing PROT-ON can be found [here](https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2023.1063971/full).
1627+
1628+
Here we will use PROT-ON to analyse the interface of our antibody-antigen complex. For that, we will use the provided matched reference structure (`4G6M-matched.pdb`) in which both chains of the antibody have the same chainID (A), which allows us to analyse all interface residues of the antibody at once.
1629+
1630+
<a class="prompt prompt-info">
1631+
Connect to the PROT-ON server page (link above) and fill in the following fields:
1632+
</a>
1633+
1634+
<a class="prompt prompt-info">
1635+
Specify your run name* --> 4G6M_matched
1636+
</a>
1637+
1638+
<a class="prompt prompt-info">
1639+
Choose / Upload your protein complex* --> Select the provided _4G6M-matched.pdb_ file
1640+
</a>
1641+
1642+
<a class="prompt prompt-info">
1643+
Which dimer chains should be analyzed* --> Select chain A for the 1st molecule and B for the 2nd
1644+
</a>
1645+
<a class="prompt prompt-info">
1646+
Pick the monomer for mutational scanning* --> Select the first molecule - the antibody (toggle the switch ON under the chain A)
1647+
</a>
1648+
1649+
<a class="prompt prompt-info">
1650+
Click on the Submit button
1651+
</a>
1652+
1653+
Your run should complete in 5-10 minutes. Once finished, you will be presented with a result page summarising the most depleting (ones that decrease the binding affinity) and most enriching (ones that increase the binding affinity) mutations.
1654+
1655+
<a class="prompt prompt-question">
1656+
Which possible mutation would you propose to improve the binding affinity of the antibody?
1657+
</a>
1658+
1659+
<details style="background-color:#DAE4E7">
1660+
<summary style="bold">
1661+
<b><i>See answer</i></b> <i class="material-icons">expand_more</i>
1662+
</summary>
1663+
The most enriching mutation is S150W with a -3.69 ΔΔG score.
1664+
</details>
1665+
<br>
1666+
1667+
<a class="prompt prompt-question">
1668+
Inspect the proposed amino acid in PyMol. Can you rationalise why it might increase the affinity?
1669+
</a>
1670+
1671+
With HADDOCK3, it is possible to take a step further. To perform the mutation, simply rename the desired residue and score such model - HADDOCK will take care of the topology regardless of the side chain differences and energy minimisation of the model. To do so, first either edit _4G6M-matched.pdb_ in your favourite text editor and save this new file as _4G6M_matched_S150W.pdb_, or use the command line:
1672+
<a class="prompt prompt-cmd">
1673+
sed 's/SER\ A\ 150/TRP\ A\ 150/g' 4G6M_matched.pdb > 4G6M_matched_S150W.pdb
1674+
</a>
1675+
1676+
Next, score the mutant using the command-line tool `haddock3-score`.
1677+
This tool performs a short workflow composed of the `topoaa` and `emscoring` modules. Use flag `--outputpdb` to save energy-minimized model:
1678+
<a class="prompt prompt-cmd">
1679+
haddock3-score 4G6M_matched_S150W.pdb \-\-outputpdb
1680+
</a>
1681+
1682+
<a class="prompt prompt-question">
1683+
Use _haddock3-score_ to calculate the score of the 4G6M-matched.pdb. Do you see a difference between wild-type and mutant scores?
1684+
Might such single-residue mutation affect the binding affinity?
1685+
</a>
1686+
1687+
<a class="prompt prompt-info">
1688+
Inspect the energy-minimized mutant model (4G6M_matched_S150W_hs.pdb) visually.
1689+
Can you rationalise why such a mutation might increase the affinity?
1690+
</a>
1691+
1692+
1693+
<details style="background-color:#DAE4E7">
1694+
<summary style="bold">
1695+
<b><i>Zoom in on the mutated residue</i></b> <i class="material-icons">expand_more</i>
1696+
</summary>
1697+
<figure style="text-align: center;">
1698+
<img width="100%" src="/education/HADDOCK3/HADDOCK3-antibody-antigen/mutant-stacking.png">
1699+
<center>
1700+
<i>TRP150 is stacking with TYR24 </i>
1701+
</center>
1702+
</figure>
1703+
<br>
1704+
</details>
1705+
<br>
1706+
1707+
1708+
### Alanine Scanning module
1709+
1710+
Another way of exploring interface energetics is by using the `alascan` module of HADDOCK3. `alascan` stands for "Alanine Scanning module".
1711+
1712+
This module is capable of mutating interface residues to Alanine and calculating the **Δ HADDOCK score** between the wild-type and mutant, thus providing a measure of the impact of each individual mutation. It is possible to scan all interface residues one by one or limit this scanning to a selected by user set of residues. By default, the mutation to Alanine is performed, as its side chain is just a methyl group, so side chain perturbations are minimal, as well as possible secondary structure changes. It is possible to perform the mutation to any other amino acid type - at your own risk, as such mutations may introduce structural uncertainty.
1713+
1714+
**Important**: 1/ `alascan` calculates the difference between wild-type score vs mutant score, i.e. positive `Δscore` indicative of the enriched (stronger) binding and negative `Δscore` is indicative of the depleted (weaker) binding; 2/ Inside `alascan`, a short energy minimization of an input structure is performed, i.e. there's no need to include an additional refinement module prior to `alascan`.
1715+
1716+
Here is an example of the workflow to scan interface energetics:
1717+
{% highlight toml %}
1718+
# ====================================================================
1719+
# Scanning interface residues with haddock3
1720+
# ====================================================================
1721+
1722+
# directory in which the scoring will be done
1723+
run_dir = "run-energetics-alascan"
1724+
1725+
# compute mode
1726+
mode = "local"
1727+
ncores = 50
1728+
1729+
# Post-processing to generate statistics and plots
1730+
postprocess = true
1731+
clean = true
1732+
1733+
molecules = [
1734+
"pdbs/4G6M_matched.pdb",
1735+
]
1736+
1737+
# ====================================================================
1738+
# Parameters for each stage are defined below
1739+
# ====================================================================
1740+
[topoaa]
1741+
1742+
[alascan]
1743+
# mutate each interface residue to Alanine
1744+
scan_residue = 'ALA'
1745+
# generate plot of delta score and its components per each mutation
1746+
plot = true
1747+
1748+
# ====================================================================
1749+
{% endhighlight %}
1750+
1751+
A scoring scenario configuration file is provided in the `workflows/` directory as `interaction-energetics.cfg`, and precomputed results are in `runs/run-energetics-alascan`.
1752+
The output folder contains, among others, a directory titled `1_alascan` with a file `scan_4G6M_matched_haddock.tsv` that lists each mutation, corresponding score and individual terms:
1753+
<pre>
1754+
##########################################################
1755+
# `alascan` results for 4G6M_matched_haddock.pdb
1756+
#
1757+
# native score = -145.5891
1758+
#
1759+
# z_score is calculated with respect to the other residues
1760+
##########################################################
1761+
chain res ori_resname end_resname score vdw elec desolv bsa delta_score delta_vdw delta_elec delta_desolv delta_bsa z_score
1762+
A 212 LYS ALA -136.33 -66.16 -367.66 3.37 1660.53 -9.26 2.52 -75.12 3.24 37.57 -0.48
1763+
A 103 ASP ALA -129.64 -59.93 -365.23 3.34 1677.97 -15.95 -3.71 -77.56 3.27 20.13 -1.41
1764+
A 54 TRP ALA -138.18 -58.34 -435.53 7.27 1690.80 -7.41 -5.30 -7.26 -0.66 7.30 -0.22
1765+
A 32 SER ALA -143.66 -60.55 -447.37 6.36 1691.72 -1.93 -3.09 4.59 0.24 6.38 0.55
1766+
A 58 ASP ALA -121.65 -63.49 -306.77 3.20 1639.20 -23.94 -0.15 -136.01 3.41 58.90 -2.52
1767+
A 33 GLY ALA -148.50 -61.56 -473.22 7.71 1693.18 2.91 -2.08 30.43 -1.10 4.92 1.22
1768+
...
1769+
</pre>
1770+
1771+
<a class="prompt prompt-question">
1772+
Can you identify the most enriching/depleting mutation of each chain?
1773+
Take a look at _scan_clt_-.tsv_ and open its visualisation _scan_clt_-.html_ in the web browser.
1774+
</a>
1775+
1776+
You can use an additional script `/scripts/get-alascan-extrema.sh` to check your answer:
1777+
<a class="prompt prompt-cmd">
1778+
bash scripts/get-alascan-extrema.sh run-energetics-alascan/1_alascan/scan_4G6M_matched_haddock.tsv
1779+
</a>
1780+
1781+
Mutation of the residue ASP58 turned out to be the most depleting within chain A.
1782+
Let us visualise it in PyMol to analyse its contribution to the binding:
1783+
<a class="prompt prompt-pymol">
1784+
File menu -> Open -> 4G6M_matched.pdb
1785+
</a>
1786+
1787+
Display ASP58 as sticks and colour it by atom:
1788+
<a class="prompt prompt-pymol">
1789+
util.cbc <br>
1790+
select asp58, (resi 58 and chain A) <br>
1791+
show sticks, asp58 <br>
1792+
util.cbao asp58
1793+
</a>
1794+
1795+
Now visualise its neighbouring residues:
1796+
<a class="prompt prompt-pymol">
1797+
select asp58_neighbour_atoms_4A, (resi 58 and chain A) around 4 and chain B <br>
1798+
select asp58_neighbour_residues, byres asp58_neighbour_atoms_4A
1799+
show sticks, asp58_neighbour_residues <br>
1800+
util.cbao asp58_neighbour_residues <br>
1801+
</a>
1802+
1803+
1804+
Let us display contacts using [show contacts plugin](https://pymolwiki.org/index.php/Show_contacts):
1805+
<figure style="text-align: center;">
1806+
<img width="100%" src="/education/HADDOCK3/HADDOCK3-antibody-antigen/asp58_contacts.png">
1807+
<center>
1808+
<i>ASP58 makes h-bonds with two neighbouring residues</i>
1809+
</center>
1810+
</figure>
1811+
1812+
We can see one hydrogen bond between ASP58 and LYS98, and two hydrogen bonds ASP58 and LYS94.
1813+
Mutating ASP58 to ALA should result in the disappearance of those h-bonds, and the overall depletion of the binding.
1814+
This is reflected by the high negative value (-136.01) of `delta_elec` in either of .tsv files.
1815+
1816+
Let us test several mutations to confirm our hypothesis.
1817+
Here is an example of the workflow to perform such mutations and save generated models:
1818+
1819+
{% highlight ini %}
1820+
# ====================================================================
1821+
# Mutating selected interface residue with haddock3
1822+
# ====================================================================
1823+
1824+
# directory in which the scoring will be done
1825+
run_dir = "run-energetics-mutations"
1826+
1827+
# compute mode
1828+
mode = "local"
1829+
ncores = 50
1830+
1831+
# Post-processing to generate statistics and plots
1832+
postprocess = true
1833+
clean = true
1834+
1835+
molecules = ["pdbs/4G6M_matched.pdb"]
1836+
1837+
# ====================================================================
1838+
# Parameters for each stage are defined below
1839+
# ====================================================================
1840+
[topoaa]
1841+
1842+
[alascan]
1843+
# mutate residue 58 of chain A to Arginine
1844+
scan_residue = "ARG"
1845+
resdic_A = [58]
1846+
# save energy-minimised mutant model
1847+
output_mutants= true
1848+
1849+
[alascan]
1850+
scan_residue = "GLY"
1851+
resdic_A = [58]
1852+
output_mutants= true
1853+
1854+
[alascan]
1855+
scan_residue = "TRP"
1856+
resdic_A = [58]
1857+
output_mutants= true
1858+
1859+
{% endhighlight %}
1860+
1861+
Configuration file for this scenario can be found in `workflows/single-residue-mutations.cfg`, precomputed results are in `run-residue-mutations`. The output folder contains, among others, an energy-minimised mutant model `1_alascan/4G6M_matched_haddock-A_D58R.pdb.gz`, and tables `.tsv` with energetics.
1862+
1863+
<a class="prompt prompt-question">
1864+
Take a look at the scores of the mutants. Which mutation depletes binding the most?
1865+
</a>
1866+
1867+
<a class="prompt prompt-question">
1868+
Inspect the mutant vs wild-type complex. Can you see the difference at the interface level?
1869+
</a>
1870+
1871+
<details style="background-color:#DAE4E7">
1872+
<summary style="bold">
1873+
<b><i>See the overlay of the mutant onto the wild-type structure </i></b> <i class="material-icons">expand_more</i>
1874+
</summary>
1875+
<figure style="text-align: center;">
1876+
<img width="100%" src="/education/HADDOCK3/HADDOCK3-antibody-antigen/mutant-ref-overlay-alascan.png">
1877+
<center>
1878+
<i>wild-type residue ASP58 is displayed in pink, and mutant residue AGR58 is displayed in orange</i>
1879+
</center>
1880+
</figure>
1881+
<br>
1882+
</details>
1883+
<br>
1884+
1885+
1886+
<a class="prompt prompt-question">
1887+
Compare values obtained with [alascan] to the corresponding values obtained with PROT-ON. Are they different? If yes, can you think of a reason why?
1888+
</a>
1889+
1890+
<details style="background-color:#DAE4E7">
1891+
<summary style="bold">
1892+
<b><i>See answer</i></b> <i class="material-icons">expand_more</i>
1893+
</summary>
1894+
The values themselves are expected to differ, because [alascan] calculates ΔHADDOCK score, while PROT-ON predicts ΔΔG.
1895+
Moreover, both tools are making predictions using different methods, so it is normal to have different results.
1896+
However, if both tools consistently identify the same mutations as binding enriching or depleting - this may signal that selected residues indeed play a key role in binding affinity.
1897+
</details>
1898+
<br>
1899+
1900+
Now let us consider how sensitive this kind of analysis is to the quality of the docking model.
1901+
Instead of using the crystal structure, repeat this analysis using the best model of the top-ranked cluster or the best model with the lowest LRMSD value.
1902+
1903+
<a class="prompt prompt-question">
1904+
Consider the most binding-enrishing/-depleting mutations predicted based on your favourite docking model.
1905+
How different are those compared to the mutations, predicted based on the crystal structure?
1906+
</a>
1907+
1908+
<hr>
1909+
<hr>
1910+
1911+
1912+
## BONUS 2: Does the antibody bind to a known interface of interleukin? ARCTIC-3D analysis
16171913

16181914
Gevokizumab is a highly specific antibody that targets an allosteric site of Interleukin-1β (IL-1β) in PDB file *4G6M*, thus reducing its binding affinity for its substrate, interleukin-1 receptor type I (IL-1RI). Canakinumab, another antibody binding to IL-1β, has a different mode of action, as it competes directly with the binding site of IL-1RI (in PDB file *4G6J*). For more details please check [this article](https://www.sciencedirect.com/science/article/abs/pii/S0022283612007863?via%3Dihub){:target="_blank"}.
16191915

@@ -1662,7 +1958,7 @@ How do the results change? Are gevokizumab or canakinumab PDB files being cluste
16621958
<hr>
16631959
<hr>
16641960

1665-
## BONUS 2: How good are AI-based models of antibody for docking?
1961+
## BONUS 3: How good are AI-based models of antibody for docking?
16661962

16671963
The release of [AlphaFold2 in late 2020](https://www.nature.com/articles/s41586-021-03819-2) has brought structure prediction methods to a new frontier, providing accurate models for the majority of known proteins. This revolution did not spare antibodies, with [Alphafold2-multimer](https://github.com/sokrypton/ColabFold){:target="_blank"} and other prediction methods (most notably [ABodyBuilder2](https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/abodybuilder2/){:target="_blank"}, from the ImmuneBuilder suite) performing nicely on the variable regions.
16681964

@@ -1869,7 +2165,7 @@ Using the Alphafold2 structure in this case is not the best option, as the input
18692165
<hr>
18702166
<hr>
18712167

1872-
## BONUS 3: Ensemble docking using a combination of exprimental and AI-predicted antibody structures
2168+
## BONUS 4: Ensemble docking using a combination of exprimental and AI-predicted antibody structures
18732169

18742170

18752171
Instead of running haddock3 using a specific input structure of the antibody, we can also use an ensemble of all available models.
@@ -2086,7 +2382,7 @@ Using the information in the _traceback_ directory, try to figure out which of t
20862382
<hr>
20872383
<hr>
20882384

2089-
## BONUS 4: Antibody-antigen complex structure prediction from sequence using AlphaFold2
2385+
## BONUS 5: Antibody-antigen complex structure prediction from sequence using AlphaFold2
20902386

20912387

20922388
With the advent of Artificial Intelligence (AI) and AlphaFold2, we can also try to predict directly the full antibody-antigen complex using AlphaFold2.
@@ -2384,7 +2680,7 @@ Try to reproduce the previous steps and examine the quality of the various gener
23842680
<hr>
23852681

23862682

2387-
## BONUS 5: Running a scoring scenario
2683+
## BONUS 6: Running a scoring scenario
23882684

23892685
This section demonstrates the use of HADDOCK3 to score the various models obtained at the previous stages (ensemble docking and AlphaFold predictions)
23902686
and observe if the HADDOCK scoring function is able to detect the quality of the models.
282 KB
Loading
279 KB
Loading

0 commit comments

Comments
 (0)