Skip to content

Allow template to have indel compared to the original sequence#538

Open
shunsakuraba wants to merge 1 commit intojwohlwend:mainfrom
shunsakuraba:fix-template-alignment
Open

Allow template to have indel compared to the original sequence#538
shunsakuraba wants to merge 1 commit intojwohlwend:mainfrom
shunsakuraba:fix-template-alignment

Conversation

@shunsakuraba
Copy link

Thank you for providing the great software. While reading the source code and tweaking the templating mechanism for my own purpose, I found that the current boltz has the problem (1) using Bio.Align and (2) featurizing the template.

First problem is that the alignment to the base structure (cif or recently pdb file) is forced to be gapless.

aligner = Align.PairwiseAligner(scoring="blastp")
aligner.mode = "local"
aligner.open_gap_score = -1000
aligner.extend_gap_score = -1000

The parameters open_gap_score and extend_gap_score has an extreme penalty and do not allow the gap.

Furthermore, the program iterates over the alignment object, which results in looping for each possible alignment in Biopython and therefore is not suitable to get the best alignment.

for result in aligner.align(query, template):

Additionally, featurizer do not correctly convert the token into the template map. The current code worked when the alignment was only a single chunk (which was automatically satisfied because of the gaplessness though)

offset = template.template_st - template.query_st

As a result, template structures currently do not allow indel to appear, significantly hampers the capability of the templating.

This PR aims to fix these problems.

Some concerns remain about the template:

  1. I haven't dug into the code whether this change requires you to retrain the template related weights (?). From my quick glance it seems fine, but I'd like to hear the opinion from experts.
  2. When using the template with the low identity, Bio.Align.PairwiseAlignment may not be able to properly align two sequences. As a reference, AF3 circumvents the problem by hmmbuild and hmmsearch and directly feeds the output alignment. We may need a similar input to directly specify the feature-template matching.
  3. When cif file does not contain proper full sequence (e.g. converting pdb without SEQRES by maxit), missing loops may result in an improper template. For example, when ACDEFG---HIKLMN where --- is a missing loop, and input sequence is ACDEFGH, the program may try to align final residue and results in a skewed structure (especially if using force: true). But I am not sure what should be the "correct" behavior here.

Previously the template mechanism only allows us to use the gapless
alignment. The gapped alignment was previously strictly prohibited to be
used due to large negative penalty. This commit fixes the problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant