Skip to content

[FEATURE REQUEST]: Performance Improvement of Tanimoto-Kernel #694

@LukasHebing

Description

@LukasHebing

Motivation

I tried using the tanimoto-kernel for BO, in a larger dataset, similar to the example:
https://experimental-design.github.io/bofire/docs/tutorials/benchmarks/006-Bayesian_optimization_over_molecules.html

Adapting this to a larger dataset results in an extreme increase of computation time. I found out, that most of the computations is done on the calculation of fingerprint vectors in utils.chemoinformatics.py:

def smiles2fingerprints(
    smiles: List[str],
    bond_radius: int = 5,
    n_bits: int = 2048,
) -> np.ndarray:
    """Transforms a list of smiles to an array of morgan fingerprints.

    Args:
        smiles (List[str]): List of smiles
        bond_radius (int, optional): Bond radius to use. Defaults to 5.
        n_bits (int, optional): Number of bits. Defaults to 2048.

    Returns:
        np.ndarray: Numpy array holding the fingerprints

    """
    rdkit_mols = [smiles2mol(m) for m in smiles]
    fps = [
        AllChem.GetMorganFingerprintAsBitVect(mol, radius=bond_radius, nBits=n_bits)  # type: ignore
        for mol in rdkit_mols
    ]

    return np.asarray(fps)

In this function, the main compute time goes into np.asarray(fps).

Two things could be done to reduce the overall computation time:

  • This function is called ad-hoc with every model training and evaluation, which means that molecules are evaluated again and again. Bit-vectors could be computed upfront and be stored somewhere, probably as lookup table (?). Or, we define a new Input-Class BitVector?
  • The parsing to np.ndarray as float seems to be a problem. We could either use the rdkit format and pass through the BitVector objects to the Kernel, or use another dtype (bool?). Is this possible with botorch?

Any optinions @jduerholt, @bertiqwerty?

Image

Describe the solution you'd like to see implemented in BoFire.

as described above

Describe any alternatives you've considered to the above solution.

No response

Is this related to an existing issue in BoFire or another repository? If so please include links to those issues here.

No response

Pull Request

None

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions