Home

Overview

PROPTIMUS RAPHAN stands for Per-residue optimisation of protein structures: Rapid alternative to optimisation with constrained alpha carbons. This approach divides a protein structure into overlapping substructures, allowing each to be optimised independently. As a result, the computation time is linear with respect to the size of the structure. Our approach can achieve results comparable to the overall optimisation of the structure with constrained alpha carbons in significantly less time. PROPTIMUS RAPHAN employs an almost quantum-mechanical-accurate force field, GFN-FF. This force field is generic, physics-based, and suitable optimised for large molecular systems.

Method

The PROPTIMUS RAPHAN method employs a Cover approach, a divide-and-conquer algorithm, dividing the protein into overlapping residual substructures for each residue. Each substructure is then optimised separately. The time required to optimise one substructure is approximately constant, as the substructures are similar in size. This leads to linear computational complexity with respect to the number of atoms.

Before the PROPTIMUS RAPHAN computation begins, a set of optimised atoms is defined for each protein residue. PROPTIMUS RAPHAN is an iterative method. Each iteration consists of the following steps:

1. Construction of residual substructures: A residual substructure is created for each non-converged residue. The residual substructure includes all atoms that are within 6 Å to at least one optimised atom of this residue. Moreover, a minimum set of other atoms is included to ensure that only bonds between two carbon atoms are cut. Atoms of each residual substructure that do not belong to the optimised atoms of the specific residue are divided into two groups:

Flexible atoms are all atoms that are closer than 4 Å to at least one optimised atom and are not alpha carbons.
Constrained atoms are the remaining atoms of the residual substructure that belong to neither the optimised nor the flexible atoms.

2. Constrained optimisation of substructures: All residual substructures undergo optimisation. During this process, all constrained atoms are constrained and therefore their coordinates cannot be altered within optimisation. Flexible atoms are not constrained during optimisation, but are not used in the construction of the partially optimised structure in the following step. The number of optimisa- tion steps is limited to prevent extensive changes in atom coordinates. Specifically, the number of optimisation steps is equal to the number of optimised atoms and the number of iterations.

3. Construction of protein from optimised atoms: Optimised atoms are taken from each optimised substructure and used to construct a new partially optimised protein structure.

**Figure 1:** Scheme of the PROPTIMUS RAPHAN method.

An advantage of PROPTIMUS RAPHAN is the ability to continuously exclude already converged residues (i.e., residues whose position no longer changes during optimisation) from the iterative process. This feature significantly speeds up the entire calculation.

Our methodology can be used with any force fields.

Comparison of protein structures

For our testing purposes, we investigate how closely the protein structures optimised by PROPTIMUS RAPHAN GFN-FF are aligned with the nearest local minimum on the GFN-FFCα potential energy hypersurface. In order to evaluate PROPTIMUS RAPHAN GFN-FF relatively, we also included a comparison of the original structures with structures optimised using PROPTIMUS RAPHAN GFN-FF + GFN-FFCα. The structures are compared as mean absolute deviation (MAD) of atom positions, bond lengths, bond angles and dihedral angles. The results of these structural comparisons are presented in Table 1.

The testing dataset would contain selected protein structures from AlphaFold DB. In particular, the sample came from the fourth version (v4) of predictions for Swiss- Prot sequences. The dataset contained 480 structures (SET_ORIG). Having the SET_ORIG prepared, three different optimisation procedures were applied to it. The following procedures (shown in Figure 2) took place:

Optimisation using GFN-FFCα method. The subjected set of structures will be further denoted as SET_Cα.
Optimisation using PROPTIMUS RAPHAN GFN-FF method. The subjected set of structures will be further denoted as SET_RAPHAN.
Optimisation using PROPTIMUS RAPHAN GFN-FF method and a subsequent one using the GFN-FFCα method. The subjected set of structures will be further denoted as SET_RAPHAN+Cα.

**Figure 2:** Schematic diagram describing the relationships between datasets. Black lines represent optimisation. Blue dotted lines represent which datasets were compared.

Table 1: Comparison of SET_RAPHAN+Cα with other sets of structures SET_Cα, SET_RAPHAN and SET_ORIG. Mean absolute deviation of atom positions, bond lengths, bond angles, dihedral angles and four characteristic dihedral angles are included.

	SET_Cα vs.	SET_RAPHAN vs.	SET_ORIG vs.
	SET_RAPHAN+Cα	SET_RAPHAN+Cα	SET_RAPHAN+Cα
MAD atomic position [Å]	0.056	0.033	0.412
MAD bond length [pm]	0.113	0.064	4.893
MAD angle [°]	0.154	0.081	2.102
MAD dihedral angle [°]	2.174	0.735	12.526
MAD φ [°]	1.339	0.881	10.302
MAD ψ [°]	1.246	0.816	9.247
MAD ω [°]	0.839	0.596	5.778
MAD χ₁ [°]	0.993	0.590	8.018

We compare SET_RAPHAN+Cα with SET_Cα. The extra step with GFN-FFCα applied in addition to PROPTIMUS RAPHAN GFN-FF is to eliminate the error induced by the approximative nature of PROPTIMUS RAPHAN GFN-FF and to compare in terms of actual local minima on the potential energy hypersurface of the GFN-FFCα method. The fact that structures from SET_Cα and SET_RAPHAN+Cα are slightly different demonstrates that PROPTIMUS RAPHAN GFN-FF can indeed converge to a slightly different local minimum on the potential energy hypersurface than GFN-FFCα does.

For visual comparison, the structure with the worst MAD of atomic positions between structures from SET_RAPHAN and SET_RAPHAN+Cα with less than a thousand atoms are shown in Figure 3.

**Figure 3:** The structure with the worst MAD of atomic positions with less than a thousand atoms from the testing dataset with UniProtKB AC P83224 optimised by PROPTIMUS RAPHAN GFN-FF (blue) and PROPTIMUS RAPHAN GFN-FF + GFN-FFCα (red). The MAD of atomic positions for this structure is 0.075 Å. While most atoms are in good agreement, the atoms of residues CYS9 and GLY8 differ (red circle). The [Mol*](https://academic.oup.com/nar/article/49/W1/W431/6270780?login=false) visualisation software was used to create the figure.

Comparison of optimisation times

For each PROPTIMUS RAPHAN GFN-FF and GFN-FFCα optimisation, the computational time was compared. This comparison is illustrated in Figure 4. In the case of GFN-FFCα, the computation time increases quadratically. In contrast, the computation time of PROPTIMUS RAPHAN GFN-FF increases linearly with respect to the number of atoms. The high variability in PROPTIMUS RAPHAN GFN-FF calculation duration is due to the fact that the total calculation time is not linear with respect to the number of atoms, but rather to the number and size of residual substructures.

**Figure 4:** Comparison of computational times of PROPTIMUS-RAPHAN GFN-FF and GFN-FFCα optimisations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Overview

Method

Comparison of protein structures

Comparison of optimisation times

Clone this wiki locally