Adaptive Chemical Embedding Model (ACE-Mol) is a task-specific chemical embedding model trained on a large collection of programatically generated chemical motifs.
Here, we show how to use the current pre-trained ACEMol. Additionally, we showcase how to fine-tune or retrain your own version of ACEMol from scratch.
The text-based nature of ACEMol makes prediction of floating-point values quirky; we recommend embedding inputs and computing logprobs for prediction tasks.
conda create --name acemol --file requirements.txt python=3.12
PretrainedACEMol helper class enables easy use of the pre-trained models from hf or a local finetuned model from a .ckpt file.
from src.pretrained import PretrainedACEMol
# Load pre-trained model (hf or local .ckpt)
acemol = PretrainedACEMol()PretrainedACEMol accepts a list of SMILES, corresponding targets, and task descriptions (one task description is enough if it is shared).
molecules = [
'O=C(/C=C\\c1ccccc1)OCc1cncs1',
'CCC(C)C(CN(C)C)c1ccc(Cl)cc1Cl',
'CCOC(=O)CC(N)c1ccc(OC)cc1',
'CN(C)Cc1ccccc1O',
'COc1c(F)c(F)c(C(=O)Nc2ccccc2N2CCN(C(=O)C(C)C)CC2)c(F)c1F',
'O=C(COC(=O)c1ccccc1F)NCc1ccc2c(c1)OCO2'
]
task = 'is halogen group present'
targets = [0, 1, 0, 0, 1, 1]We recommend using ACE-Mol as an embedding model; the embed method will create an embedding excluding the actual target and prepare a dataframe for classification or regression via logprobs.
embedded = acemol.embed(molecules, tasks, targets)
# split into train and test
train, test = embedd[:3], embedd[3:]
# use regress method for regression.
predictions = acemol.classify(train, test)We provide two additional scripts to re-train and fine-tune ACE-Mol.
We provide pre-training and toxicity datasets.
If you want to use your own dataset for fine-tuning, you will have to use the same format as the datasets above.
python3 src/train.py \
-c "./configs/config.yaml" \
-e "./data/test" \
-t "./data/train" \
-v "./data/validation" \
-m "model_name"
python3 src/finetune.py \
-p "jablonkagroup/ACEMol" \
-c "./configs/finetune.yaml" \
-e "./data/test" \
-t "./data/train" \
-v "./data/validation" \
-m "model_name"
@article{prastalo2026learning,
title={Beyond Learning on Molecules by Weakly Supervising on Molecules},
author={Gordan Prastalo and Kevin Maik Jablonka},
journal={arXiv preprint arXiv:2602.04696},
year={2026}
}
