Skip to content

PEESEgroup/Radical-Fragment-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Harnessing homolytic bond energetics to steer inverse radical design


Abstract and workflow

Homolytic bond dissociation energy (BDE) governs radical formation and bond-breaking thermodynamics, yet in molecular design it is typically evaluated only after candidate structures are proposed. Here we treat BDE as a continuous generative design coordinate rather than a post hoc descriptor, enabling distributional control over bond strength at the regime level rather than exact set-point attainment. A BDE-conditioned transformer generates radical fragment pairs steered toward prescribed single-bond strength regimes (50-130 kcal·mol⁻¹), achieving monotonic, rank-resolved energetic control with 81-94% validity and 84-92% novelty across targets. At the screening level, an MPNN screener trained on the same data source provides high-throughput energetic ranking consistent with regime-level steering. Same-functional M06-2X/def2-TZVP recalculation on randomly sampled novel generations yields target-attainment MAE of 7.7-12.7 kcal·mol⁻¹ consistent with regime-level steering, isolating generative accuracy from inter-functional offsets. Separate cross-functional evaluation at the ωB97X-D3BJ/def2-TZVP level shows preserved energetic ordering (DFT-calculated means spanning ~64 to ~119 kcal·mol⁻¹), indicating that the learned BDE axis reflects transferable thermodynamic structure rather than functional-specific artifacts. Spin-delocalization analysis further reveals a statistically significant correlation between radical localization and bond strength across the DFT-validated set. More broadly, reaction-relevant thermodynamic quantities may serve as primary generative coordinates, with PFAS-relevant C-F bond strengths motivating extension into higher-energy regimes.

Training Data (BDE labels + SMILES)
        │
        ├──► Train Generator (GPT-2 + FiLM conditioning)
        └──► Train Screener  (MPNN Graph Neural Network)
                    │
          Generate novel radical pairs
          at target BDE values (55–125 kcal/mol)
                    │
          Screen & filter with GNN predictions
                    │
          Sample novel candidates for DFT
                    │
          Write ORCA input scripts
          (Geometry Relax → SCF → Freq)
                    │
          Calculate DFT BDEs & compare to model

Repository Structure

Code/
├── data/                              # Datasets
│   ├── bde_rdf_with_multi_halo_cfc_model_3.csv   # Main training data
│   ├── test_set_exp_bdes.csv                      # Experimental BDE test set
│   └── test_set_pfas.csv                          # PFAS test set
│
├── datasets/                          # PyTorch dataset classes
│   ├── generator.py                   # RadicalPairDataset (tokenized SMILES for GPT)
│   └── screener.py                    # BDEFragmentDataset (molecular graphs for GNN)
│
├── models/                            # Neural network architectures
│   ├── generator.py                   # BDEConditionedGPT (GPT-2 + FiLM conditioning)
│   └── screener.py                    # BDEFragmentModel (MPNN encoder)
│
├── training/                          # Training entry points
│   ├── generator.py                   # Train the generator
│   └── screener.py                    # Train the screener
│
├── design_and_evaluation/             # Generation, filtering, and analysis
│   ├── generate.py                    # Sample novel pairs from trained generator
│   ├── combine_radicals.py            # Bond two radical fragments into a parent molecule
│   ├── screener_prediction.py         # Predict BDE for generated pairs using screener
│   ├── evaluate_generated_pairs.py    # Validity / uniqueness / novelty / Tanimoto metrics
│   ├── sample_examples_for_dft.py     # Select novel candidates for DFT validation
│   └── calculate_dft_bde.py           # Parse ORCA outputs and compute DFT BDEs
│
└── write_orca_scripts/                # Quantum chemistry input generation
    ├── generate_relax_scripts.py      # SMILES → 3D coords → ORCA geometry relaxation inputs
    └── generate_scf_freq_scripts.py   # Optimized geometry → ORCA SCF + frequency inputs
│
└── generated_structures/              # Radical pairs generated after model training
    ├── generated_candidates_full_list.csv   # Full list with only the screener-predicted BDEs
    └── dft_candidates_with_dft_predicted_bde.csv  # Randomly-sampled list from the full list for dft validation

Models

Generator — BDEConditionedGPT

A GPT-2 language model conditioned on a target BDE value using FiLM (Feature-wise Linear Modulation). Given a target BDE, it autoregressively generates a SMILES string encoding a fragment1 <SEP> fragment2 radical pair.

  • Tokenizer: ChemBERTa
  • Architecture: GPT-2 (n_embd=256, n_layer=8, n_head=8)
  • Training: 10 epochs, AdamW, batch size 128

Screener — BDEFragmentModel

A message-passing neural network (MPNN) that predicts BDE from molecular graphs. It encodes the parent molecule and both fragments separately, then computes:

BDE = E(fragment1) + E(fragment2) − E(parent)
  • Node features: atomic number, degree, formal charge, aromaticity, ring membership, H count
  • Edge features: bond type, conjugation, ring membership
  • Training: 50 epochs, Adam (lr=5e-4), L1 loss, ReduceLROnPlateau

Data Format

The main dataset (data/bde_rdf_with_multi_halo_cfc_model_3.csv) contains:

Column Description
molecule Parent molecule SMILES
fragment1 First radical fragment SMILES
fragment2 Second radical fragment SMILES
bond_type Type of cleaved bond (e.g., C-F, C-Cl)
bde Bond dissociation energy (kcal/mol)
set train, valid, or test split

Usage

1. Train the models

# Train the GNN screener
python -m training.screener

# Train the GPT generator
python -m training.generator

2. Generate novel radical pairs

python -m design_and_evaluation.generate

Generates 1000 fragment pairs at each of 8 target BDE values: 55, 65, 75, 85, 95, 105, 115, 125 kcal/mol.

3. Screen and filter

python -m design_and_evaluation.screener_prediction

4. Evaluate generated pairs

python -m design_and_evaluation.evaluate_generated_pairs

Reports validity, uniqueness, novelty, and Tanimoto similarity to training data.

5. Sample candidates for DFT

python -m design_and_evaluation.sample_examples_for_dft

Selects up to 20 novel pairs per BDE target and reconstructs parent molecules.

6. Write ORCA input scripts

# Step 1: Geometry relaxation
python -m write_orca_scripts.generate_relax_scripts

# Step 2: SCF + frequency calculations (after relaxation completes)
python -m write_orca_scripts.generate_scf_freq_scripts

7. Calculate DFT BDEs

python -m design_and_evaluation.calculate_dft_bde

Parses ORCA output files and computes BDE from SCF energies and zero-point energies (ZPE), converting from Hartree to kcal/mol (×627.509).


Libraries and computational dependencies

  • PyTorch + PyTorch Geometric — model training and GNN
  • Transformers (HuggingFace) — GPT-2 and ChemBERTa tokenizer
  • RDKit — cheminformatics (SMILES parsing, 3D coordinate generation)
  • pandas, numpy — data handling
  • tqdm — progress bars
  • ORCA — external quantum chemistry package for DFT calculations

Citation

@article{SheshanarayanaRadGen2026,
  author    = {R. Sheshanarayana and Fengqi You},
  title     = {Harnessing homolytic bond energetics to steer inverse radical design},
  journal   = {insert after publication},
  year      = {insert after publication},
  volume    = {insert after publication},
  pages     = {insert after publication},
  doi       = {insert after publication}
}

About

This repository implements a conditional generative framework for designing radical fragment pairs with prescribed bond dissociation energies (BDE).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages