Skip to content

545487677/MolReasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs

📖Paper | 🤗Datasets | 🤗Models Weights (Huggingface) | 🤖Models Weights (ModelScope)

🌈Large Language Models (LLMs) have demonstrated remarkable performance across various domains, yet their capabilities in molecular reasoning remain insufficiently explored. Current approaches tend to rely heavily on general-purpose prompting, which lacks domain-specific molecular semantics, while those that use fine-tuning strategies often face challenges with interpretability and reasoning depth. To address these issues, we introduce MolReasoner, a two-stage framework designed to transition LLMs from memorization towards chemical reasoning. First, we propose Mol-SFT, which initializes the model’s reasoning abilities via synthetic Chain-of-Thought (CoT) samples generated by GPT-4o and verified for chemical accuracy. Subsequently, Mol-RL applies reinforcement learning with specialized reward functions designed explicitly to align chemical structures with linguistic descriptions, thereby enhancing molecular reasoning capabilities. Our approach notably enhances interpretability, improving the model’s molecular understanding and enabling better generalization. Extensive experiments demonstrate that MolReasoner outperforms existing methods, and marking a significant shift from memorization-based outputs to robust chemical reasoning. The illustrations are shown below: Logo

📢 News

  • 🚀 [08/05/2025] We release our MolReasoner's Paper.
  • 🚀 [08/05/2025] We upload our checkpoints of MolReasoner to Huggingface.
  • 🚀 [08/04/2025] We upload our checkpoints of MolReasoner to ModelScope.
  • 🚀 [08/04/2025] We upload our training datasets of MolReasoner to Huggingface.
  • 🚀 [08/04/2025] We release MolReasoner repository and our training, inference and evaluation code.

🛠️ Setup

# 1. Install LLaMA-Factory from https://github.com/hiyouga/LLaMA-Factory repository.
# 2. Install Verl from https://github.com/volcengine/verl repository.
# 3. Install additional dependencies for both environments
pip3 install deepspeed
pip install --force-reinstall psutil==5.9.8
pip install -U "ray[data,train,tune,serve,default]"
pip install EFGs
pip install swanlab
pip install --upgrade boto3 botocore
pip install rdkit tensorboard
pip install python-Levenshtein
pip install selfies
pip install nltk

# 4. Configure NLTK data path
cp -r verl/nltk_data /root/nltk_data

# 5. Download the SciBERT model
# Download the SciBERT model from Hugging Face:
# https://huggingface.co/allenai/scibert_scivocab_uncased or
# https://huggingface.co/Sihangli/3D-MoLM


# 6. Download the QWEN 2.5-7B-Instruct model
# Download the QWEN 2.5-7B-Instruct model from Hugging Face:
# https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

Mol-SFT

Data Preparation:

Please download the data from Hugging Face and place the corresponding SFT data under the LLaMA-Factory/data directory. Then, store the data according to the information in LLaMA-Factory/data/dataset_info.json as follows:

{
  "text_based_de_novo_molecule_generation_train": {
    "file_name": "text_based_de_novo_molecule_generation_train.json"
  },
  "text_based_de_novo_molecule_generation_test": {
    "file_name": "text_based_de_novo_molecule_generation_test.json"
  },
  "molecule_captioning_train": {
    "file_name": "molecule_captioning_train.json"
  },
  "molecule_captioning_test": {
    "file_name": "molecule_captioning_test.json"
  }
}

Model Training

# 1. Molecule Captioning
bash LLaMA-Factory/train_molecule_captioning.sh
# 2. Text-based De Novo Molecule Generation
bash LLaMA-Factory/train_text_based_de_novo_molecule_generation.sh

Please remember to update the base model paths (Qwen2.5-7B-Instruct) in the following YAML files:

  • LLaMA-Factory/examples/train_full/train_molecule_captioning/sft.yml
  • LLaMA-Factory/examples/train_full/train_text_guided_molecule_generation/sft.yml

Make sure to modify the paths to the downloaded model and adjust the save paths as needed.

Mol-RL

Data Preparation:

Please download the grpo data from Huggingface.

Model Training

# 1. Molecule Captioning
bash verl/examples/grpo_trainer/grpo_train_molecule_captioning.sh
# 2. Text-based De Novo Molecule Generation
bash verl/examples/grpo_trainer/grpo_train_text_based_de_novo_molecule_generation.sh

Please make sure to follow the notes provided in the two shell files and update the paths accordingly.

Additionally, during the Molecule Captioning training, make sure to replace the line primary_path = 'xxxx/scibert_scivocab_uncased' in the file verl/verl/utils/reward_score/chembl_mol2desc.py with your own model path.

For convenience, in the code:

  • desc2mol is set to refer to text_guided_molecule_generation, and
  • mol2desc is set to refer to molecule_captioning.

Merge Model

Please refer to verl/scripts/model_merger.sh to merge the trained actor model with the Qwen2.5-7B-Instruct model.

Inference

Please refer to LLaMA-Factory/infer_molecule_captioning.sh and LLaMA-Factory/infer_text_based_de_novo_molecule_generation.sh. Make sure to replace the paths to the merged model and update the output paths for inference.

Evaluation

We have provided scripts to evaluate both tasks, located at:

  • verl/examples/data_preprocess/molecule/molecule_captioning/eval_molecule_captioning/eval_metrics.py
  • verl/examples/data_preprocess/molecule/text_guided_molecule_generation/eval_text_guided_molecule_generation/eval_metrics.py

Additionally, to demonstrate the effectiveness, we have included several baseline examples as well as the metrics from our method in the following directories:

  • verl/examples/data_preprocess/molecule/molecule_captioning/eval_molecule_captioning/saved_results

  • verl/examples/data_preprocess/molecule/text_guided_molecule_generation/eval_text_guided_molecule_generation/saved_results

To perform the metric evaluation, we have also provided some baseline evaluation scripts. The results here are exactly the same as those presented in our paper.

python verl/examples/data_preprocess/molecule/molecule_captioning/eval_molecule_captioning/grpo_eval.py # MolReasoner Molecule Captioning Evaluation
python verl/examples/data_preprocess/molecule/molecule_captioning/eval_molecule_captioning/eval_mol_instruct.py # Mol-Instruction Molecule Captioning Evaluation
python verl/examples/data_preprocess/molecule/molecule_captioning/eval_molecule_captioning/eval_qwen2_5_7b.py # Qwen2.5-7B Molecule Captioning Evaluation
python verl/examples/data_preprocess/molecule/molecule_captioning/eval_molecule_captioning/eval_llama3_70b.py # Llama3-70B Molecule Captioning Evaluation
python verl/examples/data_preprocess/molecule/text_guided_molecule_generation/eval_text_guided_molecule_generation/grpo_eval.py # MolReasoner Text-based De Novo Molecule Generation Evaluation

Results

📈 Molecule Captioning Performance

MolReasoner outperforms all closed-source and open-source baselines across BLEU-2/4, METEOR, and ROUGE metrics, establishing a new state-of-the-art in the molecule captioning task.

Method (Size) BLEU-2 ↑ BLEU-4 ↑ METEOR ↑ ROUGE-1 ↑ ROUGE-2 ↑ ROUGE-L ↑
Closed-Source Models
GPT-4o (–) 0.1194 0.0433 0.1651 0.2315 0.0738 0.1792
GPT-4o-mini (–) 0.1080 0.0400 0.1545 0.2310 0.0723 0.1776
Open-Source Models
Qwen2.5-7B-Instruct (7B) 0.0792 0.0258 0.2132 0.2091 0.0601 0.1483
DeepSeek-R1-Distill-Qwen-7B (7B) 0.1173 0.0469 0.1544 0.2209 0.0749 0.1693
Llama3.1-8B-Intstruct (8B) 0.1670 0.0769 0.2164 0.2806 0.1182 0.2250
Qwen3-8B (8B) 0.0974 0.0289 0.1733 0.2067 0.0501 0.1567
Llama3.1-70B-Instruct (70B) 0.1466 0.0658 0.1832 0.2736 0.1072 0.2203
Qwen2.5-72B-Instruct (72B) 0.1519 0.0647 0.1949 0.2729 0.0948 0.2067
Mol-Instruction (7B) 0.0956 0.0667 0.1891 0.2801 0.1823 0.2582
MolReasoner (Ours) (7B) 0.4383 0.3220 0.4754 0.5530 0.3662 0.4821

📈 Text-based de novo Molecule Generation Performance

MolReasoner surpasses both closed-source and open-source baselines across all metrics, achieving state-of-the-art performance in this molecule generation task.

Method (Size) BLEU ↑ Exact ↑ Levenshtein ↓ RDK FTS ↑ MACCS FTS ↑ MORGAN FTS ↑ Frag-J ↑ Frag-R ↑ FG-Match ↑ VALIDITY ↑
Closed-Source Models
GPT-4o (–) 0.19490.004549.35450.0926 0.20660.08360.12960.1777 0.37530.2916
GPT-4o-mini (–) 0.05220.005849.13710.0863 0.20320.08830.09870.1324 0.38980.1946
Open-Source Models
Qwen2.5-7B-Instruct (7B) 0.00020.002440.00760.0776 0.15850.05200.07730.1037 0.36010.2395
DeepSeek-R1-Distill-Qwen-7B (7B) 0.00000.001850.69570.0619 0.13270.04610.11010.1428 0.38470.0697
Llama3.1-8B-Intstruct (8B) 0.00940.002740.20920.0556 0.14700.04700.07010.0918 0.35870.2319
Qwen3-8B (8B) 0.00000.003628.25640.3692 0.47330.30590.34060.3566 0.52800.0118
Llama3.1-70B-Instruct (70B) 0.07870.005544.16260.0824 0.23230.07850.13980.1963 0.35740.4641
Qwen2.5-72B-Instruct (72B) 0.00000.004818.05880.1584 0.34560.14320.16960.2300 0.34360.1134
Mol-Instruction (7B) 0.30490.047039.42680.2914 0.44270.25240.33330.4092 0.43240.9994
MolReasoner (Ours) (7B) 0.78410.075826.9255 0.43730.67590.3627 0.52130.64140.5390 0.9679

🔭 Outlook

While MolReasoner makes significant strides toward interpretable and effective molecular reasoning—bridging CoT-style supervision with chemistry-aware reinforcement learning—there remains ample room for exploration. In future work, we plan to:

  • Broaden Task Coverage: Extend MolReasoner to additional molecular tasks (e.g. property prediction, retrosynthesis planning) and richer input modalities (e.g. 3D conformers, reaction schemes).
  • Enhance Reward Design: Incorporate experimentally grounded metrics—such as synthetic accessibility, reaction feasibility, and bioactivity scores—into our multi-level rewards to further align model outputs with real-world chemistry.
  • Scale and Efficiency: Investigate more efficient training strategies (e.g. off-policy RL, distillation) and adapt MolReasoner to larger LLM backbones without prohibitive compute costs.
  • Robustness & Fairness: Evaluate model performance on out-of-distribution and negatively biased datasets, and develop techniques to mitigate hallucinations and semantic drift in generated reasoning chains.

We hope MolReasoner not only serves as a strong baseline but also inspires the community to push the boundaries of molecular LLMs—fostering new ideas, benchmarks, and open-source collaborations aimed at truly autonomous chemical reasoning.

✒️Citation

@misc{zhao2025molreasonereffectiveinterpretablereasoning,
      title={MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs},
      author={Guojiang Zhao and Sihang Li and Zixiang Lu and Zheng Cheng and Haitao Lin and Lirong Wu and Hanchen Xia and Hengxing Cai and Wentao Guo and Hongshuai Wang and Mingjun Xu and Siyu Zhu and Guolin Ke and Linfeng Zhang and Zhifeng Gao},
      year={2025},
      eprint={2508.02066},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.02066},
}

Acknowledgement

We sincerely thank projects LLaMA-Factory, Verl, Mol-Instructions, and Visual-RFT for providing their open-source resources.

⭐ Star History

Star History Chart

About

This is the official code for MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages