Skip to content

Biopython > 1.79 sometimes breaks alignChains #2205

@jamesmkrieger

Description

@jamesmkrieger

Description of the bug.

When I run alignChains on the attached pdbs with Python 3.11 with Biopython > 1.79, I get the following error. With Biopython 1.79, the same code runs without any problems. This also means I can't use Python 3.12 as it doesn't support Biopython 1.79

ref_model.pdb
centered_model.pdb

Error log from code run shown below

@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain A from ref_model (len=1132) with Chain A from centered_model:
@>      Mapped: 1132 residues match with 100% sequence identity and 99% overlap.
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain B from ref_model (len=381) with Chain A from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain B from ref_model (len=381) with Chain A from centered_model:
@>      Failed to match chains (seqid=33%, overlap=32%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain B from ref_model (len=381) with Chain A from centered_model:
@>      Failed to match chains (seqid=12%, overlap=6%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain E from ref_model (len=709) with Chain A from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain E from ref_model (len=709) with Chain A from centered_model:
@>      Failed to match chains (seqid=34%, overlap=58%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain E from ref_model (len=709) with Chain A from centered_model:
@>      Failed to match chains (seqid=3%, overlap=6%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain F from ref_model (len=21) with Chain A from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain F from ref_model (len=21) with Chain A from centered_model:
@>      Failed to match chains (seqid=40%, overlap=2%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain F from ref_model (len=21) with Chain A from centered_model:
@>      Failed to match chains (seqid=12%, overlap=1%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain A from ref_model (len=1132) with Chain B from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain A from ref_model (len=1132) with Chain B from centered_model:
@>      Failed to match chains (seqid=34%, overlap=29%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain A from ref_model (len=1132) with Chain B from centered_model:
@>      Failed to match chains (seqid=11%, overlap=6%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain B from ref_model (len=381) with Chain B from centered_model:
@>      Mapped: 346 residues match with 100% sequence identity and 99% overlap.
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain E from ref_model (len=709) with Chain B from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain E from ref_model (len=709) with Chain B from centered_model:
@>      Failed to match chains (seqid=29%, overlap=46%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain E from ref_model (len=709) with Chain B from centered_model:
@>      Failed to match chains (seqid=5%, overlap=15%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain F from ref_model (len=21) with Chain B from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain F from ref_model (len=21) with Chain B from centered_model:
@>      Failed to match chains (seqid=28%, overlap=5%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain F from ref_model (len=21) with Chain B from centered_model:
@>      Failed to match chains (seqid=6%, overlap=5%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain A from ref_model (len=1132) with Chain E from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain A from ref_model (len=1132) with Chain E from centered_model:
@>      Failed to match chains (seqid=34%, overlap=58%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain A from ref_model (len=1132) with Chain E from centered_model:
@>      Failed to match chains (seqid=5%, overlap=7%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain B from ref_model (len=381) with Chain E from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain B from ref_model (len=381) with Chain E from centered_model:
@>      Failed to match chains (seqid=32%, overlap=49%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain B from ref_model (len=381) with Chain E from centered_model:
@>      Failed to match chains (seqid=7%, overlap=14%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain E from ref_model (len=709) with Chain E from centered_model:
@>      Mapped: 709 residues match with 100% sequence identity and 100% overlap.
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain F from ref_model (len=21) with Chain E from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain F from ref_model (len=21) with Chain E from centered_model:
@>      Failed to match chains (seqid=50%, overlap=2%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain F from ref_model (len=21) with Chain E from centered_model:
@>      Failed to match chains (seqid=0%, overlap=2%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain A from ref_model (len=1132) with Chain F from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain A from ref_model (len=1132) with Chain F from centered_model:
@>      Failed to match chains (seqid=38%, overlap=6%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain A from ref_model (len=1132) with Chain F from centered_model:
@>      Failed to match chains (seqid=7%, overlap=5%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain B from ref_model (len=381) with Chain F from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain B from ref_model (len=381) with Chain F from centered_model:
@>      Failed to match chains (seqid=46%, overlap=21%).
@> Trying to map atoms based on CEalign:
@>   Comparing Chain B from ref_model (len=381) with Chain F from centered_model:
@>      Failed to match chains (seqid=4%, overlap=13%).
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain E from ref_model (len=709) with Chain F from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain E from ref_model (len=709) with Chain F from centered_model:
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[4], line 1
----> 1 prody_ref_amap = prody.alignChains(prody_ref_model, prody_model,
      2                                    overlap=15, matchFunc=prody.sameChid)

File ~/scipion3/software/em/prody-github/ProDy/prody/proteins/compare.py:1721, in alignChains(atoms, target, match_func, **kwargs)
   1715 def alignChains(atoms, target, match_func=bestMatch, **kwargs):
   1716     """Aligns chains of *atoms* to those of *target* using :func:`.mapOntoChains` 
   1717     and :func:`.combineAtomMaps`. Please check out those two functions for details 
   1718     about the parameters.
   1719     """
-> 1721     mappings = mapOntoChains(atoms, target, match_func, **kwargs)
   1722     m, n = mappings.shape
   1723     if m > n:

File ~/scipion3/software/em/prody-github/ProDy/prody/proteins/compare.py:1225, in mapOntoChains(atoms, ref, match_func, **kwargs)
   1222             continue
   1224         simple_target = target_chain if isinstance(target_chain, SimpleChain) else SimpleChain(target_chain, False)
-> 1225         mappings[i, j] = mapChainOntoChain(simple_target, simple_chain, **kwargs)
   1227 return mappings

File ~/scipion3/software/em/prody-github/ProDy/prody/proteins/compare.py:1044, in mapChainOntoChain(mobile, target, **kwargs)
   1042     result = getCEAlignMapping(simple_target, simple_mobile)
   1043 elif method == 'seq':
-> 1044     result = getAlignedMapping(simple_target, simple_mobile)
   1045 else:
   1046     if isinstance(alignment, dict):

File ~/scipion3/software/em/prody-github/ProDy/prody/proteins/compare.py:1390, in getAlignedMapping(target, chain, alignment)
   1388 b = that[i]
   1389 if a not in gap_chars:
-> 1390     ares = next(aiter)
   1391     amatch.append(ares.getResidue())
   1392     if b not in gap_chars:

StopIteration: 

What is your setup?

The same problem exists and is fixed by python and biopython version on MacOS and Linux

How did you install ProDy?

Pip install of ProDy 2.6.1 from PyPI using a uv virtual environment on Linux and pip install -Ue with a github version on my scipion branch and on tag v2.6.1 in a conda environment on MacOS all show this behaviour.

What did your code look like?

Python 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:26:40) [Clang 14.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.9.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `--theme`, or the `%colors` magic to change IPython's themes and colors.
Cmd click to launch VS Code Native REPL

In [1]: import prody

In [2]: prody_model = prody.parsePDB("centered_model.pdb")
@> 2280 atoms and 1 coordinate set(s) were parsed in 0.07s.

In [3]: prody_ref_model = prody.parsePDB("ref_model.pdb")
@> 2243 atoms and 1 coordinate set(s) were parsed in 0.07s.

In [4]: prody_ref_amap = prody.alignChains(prody_ref_model, prody_model,
   ...:                                    overlap=15, matchFunc=prody.sameChid)

Anything else?

I don't think there is anything else.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions