Skip to content

Add additional fingerprints #66

@j-adamczyk

Description

@j-adamczyk

We need to implement additional fingerprints from multiple sources, as we want scikit-fingerprints to be the only required Python library for computing molecular fingerprints.

RDKit hashed fingerprints:

  • Avalon - ref
  • E-state - ref
  • FPCP - ref
  • MHFP - potentially, benchmark with already implemented one, ref
  • Layered - ref
  • Pattern - ref
  • Pharmacophore - ref
  • Physiochemical property fingerprints - ref
  • RDKit (substructure) - ref
  • SECFP - ref

RDKit descriptor fingerprints:

Check other libraries and software for fingerprints and descriptors, and add them to lists below:

Other descriptor-based fingerprints:

  • Electroshape descriptors - ref 1, ref 2
  • CATS descriptors - ref 1, ref 2, ref 3, ref 4; the last one includes an interesting way to use 3D distances; rejected, this is basically a simplified 2-point pharmacophore
  • CheckMol (FP3) - ref 1, ref 2, ref 3, ref 4; rejected, since it's basically covered by Laggner's SMARTS patterns
  • Laggner - ref 1, ref 2, SMARTS Patterns for Functional Group Classification by Christian Laggner; also known as CDK Substructure Fingerprint in ref 3
  • Ghose-Crippen - ref 1, take just the SMARTS patterns
  • Klekota-Roth - ref 1, ref 2
  • Lingo - ref
  • Mordred - ref 1, ref 2
  • PubChem - ref 1, ref 2; note that we need to implement the actual calculation of this fingerprint, not just connect to PUG REST API, since it's extremely unreliable
  • Scaffold Keys - ref 1, ref 2; rejected, not in any peer-reviewed publication
  • SHED - ref 1, ref 2; rejected, basically a very simplified 2-point pharmacophore fingerprint

Other fingerprints:

  • 4PT - ref 1; this can be implemented by expanding the existing pharmacophore fingerprint to N-point pharmacophores; rejected, this is both unsupported by RDKit currently, and it would have an incredibly high computational cost
  • AABBA - ref 1, ref 2; they call this kernel, but it's really an expanded autocorrelation fingerprint
  • Atom triplets - ref 1; this is very similar to atom pairs, just uses triplets of atoms, but we need to implement this from scratch, based on RDKit implementation (including atom invariants)
  • BCL2D - ref 1; rejected, since it's basically worse ECFP4
  • CSFP - ref 1, ref 2
  • FragFP - ref 1, ref 2; rejected, since it uses custom search patterns engine and it's not really doable to translate them to SMARTS
  • Graph signature - ref 1, ref 2; originally proposed in ref 3 and ref 4
  • Magpie Fingerprint - ref, see also supplementary material
  • MAP4 chiral - ref, small modification of existing MAP4
  • MDFP - ref; rejected, relies heavily on Conda-only packages
  • Mold2 - ref
  • MolPrint2D - ref; rejected, since it's basically worse ECFP4
  • MP-MFP - ref
  • MXFP - ref 1, ref 2
  • NC-MFP - ref, RDKit code is provided, but it requires quite a bit of refactoring
  • Spectrophore - ref
  • Toxicophore - ref 1, ref 2, SMARTS queries are in supporting materials; note that this is not a unique definition, and e.g. OChem ToxAlerts offers (much larger) alternative

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions