Allow template to have indel compared to the original sequence#538
Open
shunsakuraba wants to merge 1 commit intojwohlwend:mainfrom
Open
Allow template to have indel compared to the original sequence#538shunsakuraba wants to merge 1 commit intojwohlwend:mainfrom
shunsakuraba wants to merge 1 commit intojwohlwend:mainfrom
Conversation
Previously the template mechanism only allows us to use the gapless alignment. The gapped alignment was previously strictly prohibited to be used due to large negative penalty. This commit fixes the problem.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thank you for providing the great software. While reading the source code and tweaking the templating mechanism for my own purpose, I found that the current boltz has the problem (1) using Bio.Align and (2) featurizing the template.
First problem is that the alignment to the base structure (cif or recently pdb file) is forced to be gapless.
boltz/src/boltz/data/parse/schema.py
Lines 524 to 527 in c9b6af1
The parameters
open_gap_scoreandextend_gap_scorehas an extreme penalty and do not allow the gap.Furthermore, the program iterates over the alignment object, which results in looping for each possible alignment in Biopython and therefore is not suitable to get the best alignment.
boltz/src/boltz/data/parse/schema.py
Line 530 in c9b6af1
Additionally, featurizer do not correctly convert the token into the template map. The current code worked when the alignment was only a single chunk (which was automatically satisfied because of the gaplessness though)
boltz/src/boltz/data/feature/featurizerv2.py
Line 1802 in c9b6af1
As a result, template structures currently do not allow indel to appear, significantly hampers the capability of the templating.
This PR aims to fix these problems.
Some concerns remain about the template:
Bio.Align.PairwiseAlignmentmay not be able to properly align two sequences. As a reference, AF3 circumvents the problem byhmmbuildandhmmsearchand directly feeds the output alignment. We may need a similar input to directly specify the feature-template matching.ACDEFG---HIKLMNwhere---is a missing loop, and input sequence isACDEFGH, the program may try to align final residue and results in a skewed structure (especially if usingforce: true). But I am not sure what should be the "correct" behavior here.