-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Motivation
I tried using the tanimoto-kernel for BO, in a larger dataset, similar to the example:
https://experimental-design.github.io/bofire/docs/tutorials/benchmarks/006-Bayesian_optimization_over_molecules.html
Adapting this to a larger dataset results in an extreme increase of computation time. I found out, that most of the computations is done on the calculation of fingerprint vectors in utils.chemoinformatics.py:
def smiles2fingerprints(
smiles: List[str],
bond_radius: int = 5,
n_bits: int = 2048,
) -> np.ndarray:
"""Transforms a list of smiles to an array of morgan fingerprints.
Args:
smiles (List[str]): List of smiles
bond_radius (int, optional): Bond radius to use. Defaults to 5.
n_bits (int, optional): Number of bits. Defaults to 2048.
Returns:
np.ndarray: Numpy array holding the fingerprints
"""
rdkit_mols = [smiles2mol(m) for m in smiles]
fps = [
AllChem.GetMorganFingerprintAsBitVect(mol, radius=bond_radius, nBits=n_bits) # type: ignore
for mol in rdkit_mols
]
return np.asarray(fps)In this function, the main compute time goes into np.asarray(fps).
Two things could be done to reduce the overall computation time:
- This function is called ad-hoc with every model training and evaluation, which means that molecules are evaluated again and again. Bit-vectors could be computed upfront and be stored somewhere, probably as lookup table (?). Or, we define a new Input-Class
BitVector? - The parsing to
np.ndarrayasfloatseems to be a problem. We could either use therdkitformat and pass through the BitVector objects to the Kernel, or use another dtype (bool?). Is this possible with botorch?
Any optinions @jduerholt, @bertiqwerty?
Describe the solution you'd like to see implemented in BoFire.
as described above
Describe any alternatives you've considered to the above solution.
No response
Is this related to an existing issue in BoFire or another repository? If so please include links to those issues here.
No response
Pull Request
None
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request