This repository contains the official code release for our ICLR 2025 paper "LLM Meets Diffusion - A Hybrid Framework for Crystal Material Generation", by Subhojyoti Khastagir*, Kishalay Das*, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, Niloy Ganguly.
CrysLLMGen introduces a hybrid approach to generating 3D structure of crystal materials. Key contributions of CrysLLMGen are:
- Hybrid LLM + Diffusion Framework: Integrates LLMs for discrete predictions with equivariant diffusion models for continuous structural refinement.
- Two-Stage Generation: LLM proposes atom types, coordinates, and lattice; diffusion model refines them for stability and physical validity.
- Constraint-Aware Design: Supports conditional generation based on user-defined composition, space group, and natural-language prompts.
- Balanced Validity & Novelty: Achieves superior stability, structural correctness, and compositional validity compared to standalone LLMs or diffusion models.
- Architecture-Agnostic: Framework can seamlessly incorporate future LLMs and denoising architectures.
The list of dependencies is provided in the requirements.txt file, generated using pipreqs. You can install through the following commands:
pip install -r requirements.txtHowever, there may be some ad-hoc dependencies that were not captured.
If you encounter any missing packages, feel free to install them manually using pip install.
For Perov-5
python -W ignore llm_finetune.py --run-name 7b-perov --model 7b --num-epochs 1 --data-path data/perov_5For MP-20
python -W ignore llm_finetune.py --run-name 7b-mp --model 7b --num-epochs 1 --data-path data/mp_20- The fine-tuned model will be saved at:
exp/7b-perov/orexp/7b-mp/
For Perov-5
python -W ignore diff_train.py --dataset perov_5 --batch_size 512 --epochs 500 --timesteps 1000 --run-type 'train'For MP-20
python -W ignore diff_train.py --dataset mp_20 --batch_size 512 --epochs 500 --timesteps 1000 --run-type 'train'- The trained model will be saved at:
out/<Dataset>/<expt_date>/<expt_time>/, where is 'perov_5' or 'mp_20'
For Perov-5
python -W ignore llm_sample.py --model_name 7b --model_path=exp/7b-perov/checkpoint-11356 --num_samples 10000 --dataset perov --temperature 1.0 --top_p 0.7For MP-20
python -W ignore llm_sample.py --model_name 7b --model_path=exp/7b-mp/checkpoint-27136 --num_samples 10000 --dataset mp --temperature 1.0 --top_p 0.7- Generated samples will be saved as
.ptfiles, such asllm_sample_mp_10000.ptorllm_sample_perov_10000.pt - You can try out different --temperature and --top_p values, for different quality of generation.
For Perov-5
python -W ignore diff_refinement.py --model_path 'gen/' --chkpt_name <Saved Model Path> --llm_file_name llm_sample_perov_10000.pt --tasks gen --dataset perov_5 --batch_size 1024 --timesteps 1000 --diff_steps 800 --run-type 'sample'For MP-20
python -W ignore diff_refinement.py --model_path 'gen/' --chkpt_name <Saved Model Path> --llm_file_name llm_sample_mp_10000.pt --tasks gen --dataset mp_20 --batch_size 1024 --timesteps 1000 --diff_steps 700 --run-type 'sample'<Saved Model Path>: directory where the diffusion model is stored →out/<Dataset>/<expt_date>/<expt_time>/diff_steps: We use700for Perov and800for MP.- Generated samples will be saved as
.ptfiles in gen/ directory, such asgen/perov_5/eval_gen.ptorgen/mp_20/eval_gen.pt
For Perov-5
python -W ignore compute_metrics.py --root_path gen/perov_5/eval_gen.pt --tasks gen --eval_model_name perovskite --gt_file data/perov_5/test.csvFor MP-20
python -W ignore compute_metrics.py --root_path gen/mp_20/eval_gen.pt --tasks gen --eval_model_name mp20 --gt_file data/mp_20/test.csvFor any further query, feel free to contact Kishalay Das
If you are using CrysLLMGen or our Textual Dataset, please cite our work as follows:
@article{khastagir2025llm,
title={LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation},
author={Khastagir, Subhojyoti and Das, Kishalay and Goyal, Pawan and Lee, Seung-Cheol and Bhattacharjee, Satadeep and Ganguly, Niloy},
journal={arXiv preprint arXiv:2510.23040},
year={2025}
}
