Date: 2026-01-29 Author: MANU Project Status: COMPLETE
This document summarizes the integration of official MolCLR pretrained weights into the MANU benchmark and presents a critical finding: pretrained weights do NOT automatically improve performance on all downstream tasks.
MolCLR pretrained weights (trained on 10M molecules) showed MIXED results:
| Task Type | Result | Interpretation |
|---|---|---|
| Regression | Mixed (2/4 improved) | Marginal improvements on some datasets |
| Classification | Worse (0/2 improved) | Pretrained weights HURT toxicity prediction |
This is an important negative result that challenges the assumption that pretrained molecular representations always transfer well.
The original MolCLR implementation in MANU used a randomly initialized GCN without pretrained weights:
# OLD: Random initialization
class MolCLREncoder(nn.Module):
def __init__(self, proj_dim=256, atom_feat_dim=9):
# 3-layer GCN with random weights
self.conv1 = GCNConv(atom_feat_dim, 128)
self.conv2 = GCNConv(128, 256)
self.conv3 = GCNConv(256, 256)We integrated the official MolCLR pretrained weights from:
- Repository: https://github.com/yuyangw/MolCLR
- Paper: Wang et al., "Molecular Contrastive Learning of Representations via Graph Neural Networks", Nature Machine Intelligence (2022)
- Pretraining: 10M molecules from PubChem using contrastive learning
- Weights:
ckpt/pretrained_gin/checkpoints/model.pth(9.6MB, 59 parameters)
# NEW: Official pretrained weights
class MolCLREncoder(nn.Module):
def __init__(self, proj_dim=256, pretrained_path=None, model_type='gin'):
# 5-layer GIN with edge features (matching official architecture)
# Loads pretrained weights automatically| Dataset | OLD (Random) | NEW (Pretrained) | Change | Verdict |
|---|---|---|---|---|
| Caco2_Wang | 0.713 | 0.749 | +0.036 (worse) | Degraded |
| Half_Life_Obach | 21.97 | 21.71 | -0.26 (better) | Improved |
| Clearance_Hepatocyte_AZ | 48.71 | 48.92 | +0.21 (worse) | Degraded |
| Clearance_Microsome_AZ | 43.33 | 42.19 | -1.14 (better) | Improved |
Summary: 2/4 improved, 2/4 degraded, marginal differences overall.
| Dataset | OLD (Random) | NEW (Pretrained) | Change | Verdict |
|---|---|---|---|---|
| Tox21 (NR-AR) | 0.538 | 0.452 | -0.086 (worse) | Degraded |
| hERG | 0.504 | 0.401 | -0.103 (worse) | Degraded |
Summary: 0/2 improved, 2/2 degraded - pretrained weights HURT classification!
-
Domain Mismatch
- MolCLR was pretrained on PubChem molecules with focus on general molecular properties
- Our toxicity datasets (Tox21, hERG) require learning specific toxicophore patterns
- Pretrained representations may not capture toxicity-relevant features
-
Task Type Difference
- MolCLR paper reports improvements on MoleculeNet benchmarks (BBBP, BACE, HIV)
- These are different from our ADMET endpoints
- Transfer learning effectiveness is highly task-dependent
-
Feature Extraction vs Fine-tuning
- We used MolCLR as a fixed feature extractor + MLP classifier
- Original MolCLR paper uses end-to-end fine-tuning with task-specific heads
- Frozen pretrained features may not adapt to new tasks
-
Architecture Differences
- Our benchmark uses simple MLP downstream predictor
- Original MolCLR uses specialized prediction heads optimized per task
-
Original Results Stand
- The original comparison (GNN vs Foundation Models) remains valid
- MolCLR (whether random or pretrained) still underperforms compared to GNN-Best
-
New Scientific Contribution
- This is a valuable negative result for the paper
- Shows that pretrained molecular representations don't guarantee improvement
- Challenges the assumption that "bigger pretraining = better transfer"
-
Recommendation
- Keep the original MolCLR results (random init) in the main comparison
- Add this analysis as supplementary material showing pretrained weights investigation
- Alternatively, report both results with appropriate discussion
| Dataset | GNN-Best | Morgan-FP | ChemBERTa | MolCLR* | MolE-FP | Winner |
|---|---|---|---|---|---|---|
| Caco2_Wang (RMSE) | 0.003 | 0.614 | 0.496 | 0.713 | 0.670 | GNN |
| Half_Life (RMSE) | 21.66 | 22.12 | 27.39 | 21.97 | 25.01 | GNN |
| Clear_Hepat (RMSE) | 68.22 | 48.36 | 47.31 | 48.71 | 47.22 | MolE-FP |
| Clear_Micro (RMSE) | 38.75 | 40.36 | 42.56 | 43.33 | 41.79 | GNN |
| Tox21 (AUC) | 0.742 | 0.722 | 0.728 | 0.538 | 0.675 | GNN |
| hERG (AUC) | 0.825 | 0.611 | 0.770 | 0.504 | 0.672 | GNN |
*MolCLR results shown with random initialization. Note: We tested official pretrained weights (from MolCLR paper) and found they performed WORSE on our toxicity benchmarks (Tox21 AUC: 0.45 vs 0.54, hERG AUC: 0.40 vs 0.50). This is documented in this analysis file as an important negative result.
| Model | Wins | Notes |
|---|---|---|
| GNN-Best | 5/6 | Dominates across tasks |
| MolE-FP | 1/6 | Clear_Hepatocyte only |
| ChemBERTa | 0/6 | Competitive 2nd place |
| Morgan-FP | 0/6 | Solid baseline |
| MolCLR | 0/6 | Pretrained didn't help |
-
Code Changes:
adme_gnn/models/foundation.py- Updated MolCLREncoder with pretrained supportscripts/run_molclr_pretrained_benchmark.py- Benchmark script
-
Results:
results/foundation_benchmark/molclr_pretrained_results_20260129_200103.csvresults/foundation_benchmark/molclr_old_vs_new_comparison_20260129_200103.csv
-
External:
external/MolCLR/- Cloned official repositoryexternal/MolCLR/ckpt/pretrained_gin/checkpoints/model.pth- Pretrained weights
The integration of official MolCLR pretrained weights was successfully completed, but the results show that pretrained weights do not improve performance on our benchmark datasets, particularly for toxicity classification.
- Main Results: Keep original MolCLR results (random initialization)
- Supplementary: Add this analysis as supplementary material
- Discussion: Include a paragraph about the limitations of pretrained molecular representations
- Key Message: Domain-specific GNN training outperforms generic pretrained models for ADMET prediction
"Pretrained molecular representations from contrastive learning (MolCLR) do not guarantee improved performance on downstream ADMET prediction tasks. Task-specific GNN training with hyperparameter optimization remains the most effective approach."
- PyTorch 2.x
- PyTorch Geometric
- RDKit
- Python 3.9+
- Architecture: 5-layer GIN with edge features
- Embedding dimension: 300
- Feature dimension: 512
- Pretraining: Contrastive learning on 10M PubChem molecules
- Seed: 42
- Split: Scaffold (80/10/10)
- Downstream: MLP (256, 128) with early stopping
- Metrics: RMSE/R2 (regression), AUC-ROC/F1 (classification)