Skip to content

Confusing error message about turning on CCD download #140

@mattwthompson

Description

@mattwthompson

Is your feature request related to a problem?

From a relatively fresh install:

In [1]: from openff.pablo import topology_from_pdb

In [2]: topology_from_pdb("1CSA.pdb")
---------------------------------------------------------------------------
PdbResidueMatchError                      Traceback (most recent call last)
Cell In[2], line 1
----> 1 topology_from_pdb("1CSA.pdb")

File ~/mamba/envs/toolkit-2136/lib/python3.13/site-packages/openff/pablo/_pdb.py:178, in topology_from_pdb(file, residue_library, additional_definitions, format, use_canonical_names)
    175 else:
    176     data = PdbData.from_file(file, format=format)  # type: ignore
--> 178 matches = data.get_successful_matches(
    179     residue_library,
    180     list(additional_definitions),
    181 )
    183 topology = _build_topology(
    184     matches=matches,
    185     data=data,
    186     use_canonical_names=use_canonical_names,
    187 )
    189 _check_all_conects(topology, data)

File ~/mamba/envs/toolkit-2136/lib/python3.13/site-packages/openff/pablo/_pdb_data.py:1777, in PdbData.get_successful_matches(self, residue_library, additional_definitions)
   1775         return residues + additional_matches
   1776     else:
-> 1777         raise create_pdb_residue_match_error(
   1778             data=self,
   1779             errors=errors,
   1780             additional_definitions=additional_definitions,
   1781             additional_matches=additional_matches,
   1782             unmatched_pdb_idcs=unmatched_atoms,
   1783             residue_library=residue_library,
   1784         )
   1786 raise create_pdb_residue_match_error(
   1787     data=self,
   1788     errors=errors,
   1789     additional_definitions=additional_definitions,
   1790     residue_library=residue_library,
   1791 )

PdbResidueMatchError: some residues could not be identified
A topology cannot be created without chemical information for every
atom and bond. The following residues present in PDB file
1CSA.pdb
could not be identified from the provided chemical library:
  A:DAL#0 (l3-12): No residue definitions

  A:MVA#3 (l57-75): No residue definitions

  A:BMT#4 (l76-105): No residue definitions

  A:ABA#10 (l186-198): No residue definitions

Some missing residues are likely to be in the CCD; you can download
them automatically by setting `residue_library.auto_download = True`
or manually with the get_from_ccd method.

This error message is close to perfect - it tells me what's missing and suggests a way to resolve it - but I don't really know where to start interacting with a residue_library argument which I didn't pass in. I know it's CcdCache but a new user would need to do some digging to figure this out.

Describe the solution you'd like

I found this block of code which is more or less what a user would want to copy-paste in this context:

Even files that include ligands can load automatically, as long as the ligands have all their atoms and are named in the "standard" PDB way (according to the CCD) - we just need to tell Pablo that it can implicitly access the internet:
```{python}
from openff.pablo import topology_from_pdb, STD_CCD_CACHE
STD_CCD_CACHE.auto_download = True
topology = topology_from_pdb("1c9h_prepared.pdb")
topology.visualize()
```

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions