Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 184 additions & 10 deletions education/HADDOCK3/HADDOCK3-antibody-antigen/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -932,7 +932,7 @@ In the above workflow we see in three modules a *tolerance* parameter defined. U

*__Note__* that, in contrast to HADDOCK2.X, we have much more flexibility in defining our workflow.
As an example, we could use this flexibility by introducing a clustering step after the initial rigid-body docking stage, selecting a given number of models per cluster and refining all of those.
For an example of this strategy see the BONUS 3 section about ensemble docking.
For an example of this strategy see the 4 section about ensemble docking.


<hr>
Expand Down Expand Up @@ -1251,10 +1251,11 @@ In this execution mode the HADDOCK3 job should be submitted to the batch system
## Analysis of docking results

In case something went wrong with the docking (or simply if you do not want to wait for the results) you can find the following precalculated runs in the `runs` directory:
- `run1`: run created using the unbound antibody.
- `run1-af2`: run created using the Alphafold-multimer antibody (see BONUS 2).
- `run1-abb`: run created using the Immunebuilder antibody (see BONUS 2).
- `run1-ens`: run created using an ensemble of antibody models (see BONUS 3).
- `run1`: docking run created using the unbound antibody.
- `run1-af2`: docking run created using the Alphafold-multimer antibody (see 3).
- `run1-abb`: docking run created using the Immunebuilder antibody (see 3).
- `run1-ens`: docking run created using an ensemble of antibody models (see 4).
- `run-scoring`: scoring run created using various models obtained at the previous stages (see 6).


Once your run has completed - inspect the content of the resulting directory.
Expand Down Expand Up @@ -1613,7 +1614,180 @@ HADDOCK3's intrinsic flexibility can be used to improve the performance of antib
<hr>
<hr>

## BONUS 1: Does the antibody bind to a known interface of interleukin? ARCTIC-3D analysis
## 1: Dissecting the interface energetics: what is the impact of a single mutation?

Mutations at the binding interfaces can have widely varying effects on binding affinity - some may be negligible, while others can significantly strengthen or weaken the interaction. Exploring these mutations helps identify critical amino acids for redesigning a structurally characterized protein-protein interfaces, which paves the way for developing protein-based therapeutics to deal with diverse range of diseases.
To pinpoint such amino acids positions, the residues across the protein interaction surfaces are either randomly or strategically mutated. Scanning mutations in this manner is experimentally costly. Therefore, computational methods have been developed to estimate the impact of an interfacial mutation on protein-protein interactions.
These computational methods come it two main flavour. One involves rigorous free energy calculations, and, while highly accurate, these methods are computationally expensive. The other category includes faster, approximate approaches that predicts changes in binding energy using statistical potentials, machine learning, empirical scoring functions etc. Though less precise, these faster methods are practical for large-scale screening and early-stage analysis. In this bonus excersice, we will take a look at two quick ways of estimating the effect of a single mutation in the interface.


### PROT-ON and haddock3-scoring to inspect a single mutation

PROT-ON (Structure-based detection of designer mutations in PROTein-protein interface mutatiONs) is a tool and [online server](http://proton.tools.ibg.edu.tr:8001/about) that scans all possible interfacial mutations and **pedicts ΔΔG score** by using EvoEF1 (active in both on the web server and stand-alone versions) or FoldX (active only in the stand-alone version) with the aim of finding the most mutable positions. The original publication describing PROT-ON can be found [here](https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2023.1063971/full).

Here we will use PROT-ON to analyse the interface of our antibody-antigen complex. For that, we will use the provided matched reference structure (`4G6M-matched.pdb`) in which both chains of the antibody have the same chainID (A), which allows us to analyse all interface residues of the antibody at once.

<a class="prompt prompt-info">
Connect to the PROT-ON server page (link above) and fill in the following fields
</a>

<a class="prompt prompt-info">
Specify your run name* --> 4G6M_matched
</a>

<a class="prompt prompt-info">
Choose / Upload your protein complex* --> select the provided _4G6M-matched.pdb_ file
</a>

<a class="prompt prompt-info">
Which dimer chains should be analyzed* --> select chain A for the 1st molecule and B for the 2nd
</a>
<a class="prompt prompt-info">
Pick the monomer for mutational scanning* --> select the first molecule - the antibody (toggle the switch ON under the chain A)
</a>

<a class="prompt prompt-info">
Click on Submit button
</a>

You run should complete in 5-10 minutes. Once finished, you will be presented with a result page summarising the most depleting (ones that lower the binding affinity) and most enriching (ones that increases the binding affinity) mutations.

<a class="prompt prompt-question">
Which possible mutation would you propose to improve the binding affinity of the antibody?
</a>

<details style="background-color:#DAE4E7">
<summary style="bold">
<b><i>See answer</i></b> <i class="material-icons">expand_more</i>
</summary>
The most enriching mutation is S150W with -3.69 ΔΔG score.
</details>
<br>

<a class="prompt prompt-question">
Inspect the proposed amino acid in PyMol. Can you rationalise why it might increase the affinity?
</a>

With HADDOCK3 it is possible to take a step further. To perform the mutation, simply rename the desired residue and score such model - HADDOCK will take care of the topology regardless on the side chain differences and energy minimisation of the model. To do so, first either edit _4G6M-matched.pdb_ in your favourite text editor and save this new file as _4G6M_matched_S150W.pdb_, or use command line:
<a class="prompt prompt-cmd">
sed 's/SER\ A\ 150/TRP\ A\ 150/g' 4G6M_matched.pdb > 4G6M_matched_S150W.pdb
</a>

Next, score the mutant using command-line tool `haddock3-score`.
This tool performs a short workflow composed of the `topoaa` and `emscoring` modules. Use flag `--outputpdb` to save energy-minimized model:
<a class="prompt prompt-cmd">
haddock3-score 4G6M_matched_S150W.pdb --outputpdb
</a>

<a class="prompt prompt-question">
Use `haddock3-score` to calculate score of the _4G6M-matched.pdb_. Do you see a difference between wild type and mutant scores? 4G6M_matched.pdb = -147.0143 4G6M_matched_S150W.pdb = -167.4492
Might such single-residue mutation affect the binding affinity?
</a>

<a class="prompt prompt-info">
Inspect the energy-minimized mutant model (4G6M_matched_S150W_hs.pdb) visually.
Can you rationalise why such mutation might increase the affinity?
</a>

### Alanine Scanning module

Another way of exploring interface energetics is by using the `alascan` module of HADDOCK3. `alascan` stands for "Alanine Scanning module".

This module is capable of mutating interface residues to Alanine and calculating the **Δ HADDOCK score** between the wild type and mutant, thus providing a measure of the impact of each individual mutation. It is posible to scan all interface residues one by one, or limit this scanning to a selected by user set of residues. By default, the mutation to Alanine is performed, as its side chain is just a methyl group, so side chain perturbations are minimal, as well as possible secondary strcuture changes. It is possible to perform the mutation to any other amino acid type - at your own risk, as such mutations may introduce structural uncertainty.

**Important**: 1/ `alascan` calculates the difference between wild type score vs mutant score, i.e. positive `Δscore` indicative of the enriched (stronder) binding and negative `Δscore` is indicative of the depleated (weaker) binding; 2/ Inside `alascan`, a short energy minimization of an input strcuture is performed, i.e. there's no need to inclued an additionla refinement module prior to `alascan`.

Here is an example of the workflow to scan interface energetics:
{% highlight toml %}
# ====================================================================
# Scanning interface residues with haddock3
# ====================================================================

# directory in which the scoring will be done
run_dir = "run-energetics-alascan"

# compute mode
mode = "local"
ncores = 50

# Post-processing to generate statistics and plots
postprocess = true
clean = true

molecules = [
"pdbs/4G6M_matched.pdb",
]

# ====================================================================
# Parameters for each stage are defined below
# ====================================================================
[topoaa]

[alascan]
# mutate each interface residue to Alanine
scan_residue = 'ALA'
# generate plot of delta score and its components per each mutation
plot = true

# ====================================================================
{% endhighlight %}

! candidate for replacing this paragraph:
A scoring scenario configuration file is provided in the `workflows/` directory as `interaction-energetics.cfg`, precomputed results are in `runs/run-energetics`.
However, this pre-made workflow contains an extra refinement step, so the score of such refined model does not match the score of our input strcuture.

The output folder `run-energetics-alascan` contain, among others, a directory titled `1_alascan` with a file `scan_4G6M_matched_haddock.tsv` that lists each mutation, corresponding score and individual terms:
<pre>
##########################################################
# `alascan` results for 4G6M_matched_haddock.pdb
#
# native score = -145.5891
#
# z_score is calculated with respect to the other residues
##########################################################
chain res ori_resname end_resname score vdw elec desolv bsa delta_score delta_vdw delta_elec delta_desolv delta_bsa z_score
A 212 LYS ALA -136.33 -66.16 -367.66 3.37 1660.53 -9.26 2.52 -75.12 3.24 37.57 -0.48
A 103 ASP ALA -129.64 -59.93 -365.23 3.34 1677.97 -15.95 -3.71 -77.56 3.27 20.13 -1.41
A 54 TRP ALA -138.18 -58.34 -435.53 7.27 1690.80 -7.41 -5.30 -7.26 -0.66 7.30 -0.22
A 32 SER ALA -143.66 -60.55 -447.37 6.36 1691.72 -1.93 -3.09 4.59 0.24 6.38 0.55
A 58 ASP ALA -121.65 -63.49 -306.77 3.20 1639.20 -23.94 -0.15 -136.01 3.41 58.90 -2.52
A 33 GLY ALA -148.50 -61.56 -473.22 7.71 1693.18 2.91 -2.08 30.43 -1.10 4.92 1.22
...
</pre>

You can use an additional script `/scripts/get-alascan-extrema.sh` to quickly find the most enriching and depleting mutations of each chain:
<a class="prompt prompt-cmd">
bash scripts/get-alascan-extrema.sh run-energetics-alascan/1_alascan/scan_4G6M_matched_haddock.tsv
</a>

Alternatively, you can examine tsv file visually:
<a class="prompt prompt-info">
Take a look at `scan_clt_-.html` in the browser to get a visual impression of the values in tsv file.
</a>

<a class="prompt prompt-question">
Compare values obtained with `alascan` to the corresponding values obtained with PROT-ON. Are they different? if yes, can you think of a reason why?
</a>

<details style="background-color:#DAE4E7">
<summary style="bold">
<b><i>See answer</i></b> <i class="material-icons">expand_more</i>
</summary>
The values themselves are expected to differ, because `alascan` calculates ΔHADDOCK score, while PROT-ON predicts ΔΔG. Moreover, both tools are making predictions using different methods, so it is normal to have different results. However, if both tools consistently identify the same mutations as binding enreaching or depleting - this may signal that mutated residues indeed play a key role in binding affinity.
</details>
<br>

Now let us consider how sensitive is this kind of analysis on the quality of the docking model. Instead of using the crystal structure, repeat this analysis using the best model of the top-ranked cluster or the best model with the lowest LRMSD value.

<a class="prompt prompt-question">
Consider the most binding-enrishing/-depleating mutations predicted based on the docking model. How different are those compared to the mutations, predicted based on the crystal structure?
</a>

<hr>
<hr>


## BONUS 2: Does the antibody bind to a known interface of interleukin? ARCTIC-3D analysis

Gevokizumab is a highly specific antibody that targets an allosteric site of Interleukin-1β (IL-1β) in PDB file *4G6M*, thus reducing its binding affinity for its substrate, interleukin-1 receptor type I (IL-1RI). Canakinumab, another antibody binding to IL-1β, has a different mode of action, as it competes directly with the binding site of IL-1RI (in PDB file *4G6J*). For more details please check [this article](https://www.sciencedirect.com/science/article/abs/pii/S0022283612007863?via%3Dihub){:target="_blank"}.

Expand Down Expand Up @@ -1662,7 +1836,7 @@ How do the results change? Are gevokizumab or canakinumab PDB files being cluste
<hr>
<hr>

## BONUS 2: How good are AI-based models of antibody for docking?
## BONUS 3: How good are AI-based models of antibody for docking?

The release of [AlphaFold2 in late 2020](https://www.nature.com/articles/s41586-021-03819-2) has brought structure prediction methods to a new frontier, providing accurate models for the majority of known proteins. This revolution did not spare antibodies, with [Alphafold2-multimer](https://github.com/sokrypton/ColabFold){:target="_blank"} and other prediction methods (most notably [ABodyBuilder2](https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/abodybuilder2/){:target="_blank"}, from the ImmuneBuilder suite) performing nicely on the variable regions.

Expand Down Expand Up @@ -1869,7 +2043,7 @@ Using the Alphafold2 structure in this case is not the best option, as the input
<hr>
<hr>

## BONUS 3: Ensemble docking using a combination of exprimental and AI-predicted antibody structures
## BONUS 4: Ensemble docking using a combination of exprimental and AI-predicted antibody structures


Instead of running haddock3 using a specific input structure of the antibody, we can also use an ensemble of all available models.
Expand Down Expand Up @@ -2086,7 +2260,7 @@ Using the information in the _traceback_ directory, try to figure out which of t
<hr>
<hr>

## BONUS 4: Antibody-antigen complex structure prediction from sequence using AlphaFold2
## BONUS 5: Antibody-antigen complex structure prediction from sequence using AlphaFold2


With the advent of Artificial Intelligence (AI) and AlphaFold2, we can also try to predict directly the full antibody-antigen complex using AlphaFold2.
Expand Down Expand Up @@ -2384,7 +2558,7 @@ Try to reproduce the previous steps and examine the quality of the various gener
<hr>


## BONUS 5: Running a scoring scenario
## BONUS 6: Running a scoring scenario

This section demonstrates the use of HADDOCK3 to score the various models obtained at the previous stages (ensemble docking and AlphaFold predictions)
and observe if the HADDOCK scoring function is able to detect the quality of the models.
Expand Down