Skip to content

Master's Thesis: Constrained Molecular Graph Generation with Diffusion Models

License

Notifications You must be signed in to change notification settings

ranaislek/ConStruct-Thesis

Repository files navigation


Constrained Molecular Graph Generation with Diffusion Models: Extending the ConStruct Framework

🔬 Fork Attribution & Extensions

This repository is a fork and extension of the original ConStruct implementation by Madeira et al.

Original Work: ConStruct - Generative Modelling of Structurally Constrained Graphs
Original Authors: Manuel Madeira et al.
Original Paper: "Generative Modelling of Structurally Constrained Graphs"
License: MIT

Extensions in This Fork:

  • Ring-Based Constraints: Extended the model to include comprehensive ring count and ring length constraints
  • Molecular Dataset Focus: Extensive testing and validation on molecular datasets, starting with QM9
  • Edge-Deletion Constraints: Implemented "at most" constraints for ring count and ring length
  • Organized Experiment Structure: Created systematic experiment configurations for debug and thesis-level testing
  • SLURM Integration: Added comprehensive SLURM job scripts for cluster execution
  • Constraint Validation: Implemented robust constraint satisfaction monitoring and validation

Key Differences from Original:

  • Constraint Types: Added ring_count_at_most and ring_length_at_most projectors
  • Molecular Focus: Optimized for molecular graph generation with QM9 dataset
  • Experiment Organization: Structured configs and scripts for systematic constraint testing
  • Cluster Support: Enhanced SLURM integration for high-performance computing environments

🚦 Bulletproof Environment Setup Instructions

These steps are based on real-world cluster, GPU, RDKit, PyTorch, and graph-tool nightmares.

You MUST follow the order and warnings below, or your environment will break.


1. Create and Activate Your Conda Environment

conda create -y -c conda-forge -n construct python=3.9 rdkit=2023.03.2
conda activate construct

2. Check RDKit Works

python -c "from rdkit import Chem"
# No error means it's fine.

3. Install graph-tool (optional)

conda install -c conda-forge graph-tool=2.45
python -c "import graph_tool as gt"

⚠️ NOTE:

  • graph-tool is only required for non-molecular datasets (e.g., tree, planar, lobster).
  • If you work only with molecular datasets (QM9, etc.), you can skip installing graph-tool to avoid compatibility headaches.

4. Install PyTorch (CUDA 11.8), then torch-geometric

pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
pip install torch-geometric==2.3.1
python -c "import torch; print(torch.cuda.is_available())"
# Should print True if GPU is visible.

5. Install fcd (Fréchet ChemNet Distance, CODE ONLY -> don't pip install fcd !!!)

pip install --no-deps fcd
# Do NOT install dependencies here, or you WILL break torch/rdkit versions!
# This gives you fcd.load_ref_model, fcd.get_fcd, etc.

6. Install the Rest of Your Requirements

pip install -r requirements.txt
# (If requirements.txt has torch or rdkit, double check they don't get downgraded!)

7. Install Your Own Package (Editable Dev Mode, If Needed)

pip install -e .

8. Compile ORCA if Needed

cd ./ConStruct/analysis/orca
g++ -O2 -std=c++11 -o orca orca.cpp
cd -

9. Test Everything

python -c "import fcd; print(hasattr(fcd, 'load_ref_model'))"
python -c "import torch; print(torch.cuda.is_available())"

Both should print True or not error.


⚠️ CRITICAL WARNINGS!

  • Never use pip install fcd (without --no-deps) after torch/rdkit, or you’ll nuke your versions.
  • Never install cuda libraries via conda. Cluster GPUs already have drivers.
  • Never install both fcd and fcd_torch in the same env unless you know why.
  • Always check for libstdc++ or libgomp errors (see troubleshooting below).

🔗 Summary Table

Step Command
Create env conda create -y -c conda-forge -n construct python=3.9 rdkit=2023.03.2
Activate env conda activate construct
Install graph-tool conda install -c conda-forge graph-tool=2.45
Install PyTorch pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
Install torch-geometric pip install torch-geometric==2.3.1
Install fcd pip install --no-deps fcd
Other packages pip install -r requirements.txt
Your package pip install -e .
Compile ORCA g++ -O2 -std=c++11 -o orca orca.cpp

🆘 Troubleshooting

fcd/rdkit libstdc++ error:

If you get something like ImportError: ... libstdc++.so.6: version 'GLIBCXX_3.4.29' not found ... run:

find $CONDA_PREFIX -name "libstdc++.so.6"
LD_PRELOAD=$CONDA_PREFIX/lib/libstdc++.so.6 python -c "from rdkit import Chem; import fcd; print(hasattr(fcd, 'load_ref_model'))"

If it fixes things, make it permanent:

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_PRELOAD="$CONDA_PREFIX/lib/libstdc++.so.6"' > $CONDA_PREFIX/etc/conda/activate.d/zz_preload_libstdcxx.sh
chmod +x $CONDA_PREFIX/etc/conda/activate.d/zz_preload_libstdcxx.sh

graph-tool/libgomp error:

If you see libgomp-a34b3233.so.1: version 'GOMP_5.0' not found (required by ...) run:

export LD_PRELOAD="$CONDA_PREFIX/lib/libgomp.so.1"
python test_env.py

If it works, make it permanent:

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_PRELOAD="$CONDA_PREFIX/lib/libgomp.so.1"' > $CONDA_PREFIX/etc/conda/activate.d/zz_preload_libgomp.sh
chmod +x $CONDA_PREFIX/etc/conda/activate.d/zz_preload_libgomp.sh

If still not working, add to every SLURM script after conda activate:

export LD_PRELOAD="$CONDA_PREFIX/lib/libgomp.so.1"

Quick Cluster Sanity Check Script

Paste and run these one by one in your (construct) environment:

  1. RDKit Basic Import

    python -c "from rdkit import Chem; print(Chem.MolFromSmiles('CCO') is not None)"
  2. PyTorch + CUDA Check

    python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
  3. torch-geometric Check

    python -c "import torch_geometric; print(torch_geometric.__version__)"
  4. fcd Import and Model Load

    python -c "import fcd; print(hasattr(fcd, 'load_ref_model')); m = fcd.load_ref_model(); print(m is not None)"
  5. Your Own Package Import

    python -c "import ConStruct; print('ConStruct imported\!')"
  6. (Optional) Try a minimal fcd score calculation

    python -c "import fcd; s = fcd.get_fcd(['CCO', 'CCC'], ['CCO', 'CCN']); print('FCD score:', s)"

If all these work: your env is cluster-proof.


Run the code

🚀 Organized Experiment Structure

The codebase includes a comprehensive, organized experiment structure for testing different constraint types.

Note: Edge-insertion constraints are documented but not yet implemented in the current codebase.

Directory Structure

configs/experiment/
├── debug/                          # Debug-level experiments (quick testing)
│   ├── no_constraint/             # No constraint experiments
│   └── edge_deletion/             # Edge-deletion constraints ("at most")
│       ├── planarity/             # Planarity constraints
│       ├── ring_count_at_most/   # Ring count "at most" constraints
│       └── ring_length_at_most/  # Ring length "at most" constraints
└── thesis/                         # Thesis-level experiments (full-scale)
    ├── no_constraint/             # No constraint experiments
    └── edge_deletion/             # Edge-deletion constraints ("at most")
        ├── planarity/             # Planarity constraints
        ├── ring_count_at_most/   # Ring count "at most" constraints
        └── ring_length_at_most/  # Ring length "at most" constraints

ConStruct/slurm_jobs/
├── debug/                          # Debug-level SLURM scripts
│   ├── no_constraint/             # No constraint scripts
│   └── edge_deletion/             # Edge-deletion scripts
└── thesis/                         # Thesis-level SLURM scripts
    ├── no_constraint/             # No constraint scripts
    └── edge_deletion/             # Edge-deletion scripts

Constraint Types

Edge-Deletion Constraints ("At Most"):

  • Purpose: Limit maximum ring count, ring length, or enforce planarity
  • Transition: absorbing_edges
  • Projectors: ring_count_at_most, ring_length_at_most, planar
  • Use Case: Generate molecules with limited ring complexity or planar structures

No Constraint:

  • Purpose: Baseline training without any constraints
  • Transition: absorbing_edges
  • Projector: null
  • Use Case: Generate molecules without structural constraints

Edge-Insertion Constraints ("At Least") - Not Yet Implemented:

  • Status: Documented but not implemented in current codebase
  • Note: The edge_insertion transition and ring_count_at_least/ring_length_at_least projectors are commented out in the model configuration

🧪 Running Experiments (some examples)

Direct Python Execution

# Debug experiments
python ConStruct/main.py \
  --config-name experiment/debug/no_constraint/qm9_debug_no_constraint.yaml \
  --config-path configs/

# Ring count at most 2 (debug)
python ConStruct/main.py \
  --config-name experiment/debug/edge_deletion/ring_count_at_most/qm9_debug_ring_count_at_most_2.yaml \
  --config-path configs/

# Planarity constraint (debug)
python ConStruct/main.py \
  --config-name experiment/debug/edge_deletion/planarity/qm9_debug_planar.yaml \
  --config-path configs/

# Thesis experiments
python ConStruct/main.py \
  --config-name experiment/thesis/edge_deletion/ring_count_at_most/qm9_thesis_ring_count_at_most_3.yaml \
  --config-path configs/

# Ring length at most 5 (thesis)
python ConStruct/main.py \
  --config-name experiment/thesis/edge_deletion/ring_length_at_most/qm9_thesis_ring_length_at_most_5.yaml \
  --config-path configs/

SLURM Job Submission

# Debug experiments
sbatch ConStruct/slurm_jobs/debug/no_constraint/qm9_no_constraint_debug.slurm
sbatch ConStruct/slurm_jobs/debug/edge_deletion/ring_count_at_most/qm9_ring_count_at_most_2_debug.slurm
sbatch ConStruct/slurm_jobs/debug/edge_deletion/planarity/qm9_debug_planar.slurm

# Thesis experiments
sbatch ConStruct/slurm_jobs/thesis/no_constraint/qm9_no_constraint_thesis.slurm
sbatch ConStruct/slurm_jobs/thesis/edge_deletion/ring_count_at_most/qm9_ring_count_at_most_3_thesis.slurm
sbatch ConStruct/slurm_jobs/thesis/edge_deletion/ring_length_at_most/qm9_ring_length_at_most_5_thesis.slurm

About

Master's Thesis: Constrained Molecular Graph Generation with Diffusion Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors