GitHub - ddomlab/GP_collab: This is collaboration for GP and PLS-Dataset

This repository provides code and protocols for combining kernels to train on molecular fingerprints and continuous parameters together.

Overview

The repository is set-up to make the results easy to reproduce. If you get stuck or like to learn more, please feel free to open an issue.

Setup

The environment.yml file specifies the conda virtual environment. :

 conda env create -f environment.yml

Repository Structure

code_/                       
├── cleaning/                
│   └── generate_clean_dataset.py             # Main script to clean and prepare dataset
├── notebooks/
│   ├── Image_2structure_Molscribe_Decimer_V2.ipynb       # Image-to-structure using OCSR
│   ├── PLS_Data_Analysis.ipynb                           # Polymer light scattering data analysis
│   ├── Rh Analysis and Validation.ipynb                  # Hydrodynamic radius analysis & validation
│   ├── Split Rh peaks for multioutput regression.ipynb   # Splitting peaks for regression tasks
│   ├── Trimer Clustering and Analysis.ipynb              # OOD clustering & analysis
│   ├── polymer_structure_wo_hsp.zip                      # Image of Polymer structures without HSPs
│   ├── Structures image collected (read by OCSR).zip     # Collected images for OCSR
│   └── Rg data with aging imputed.pkl                    # Dataset with imputed aging data (used for OOD clustering)
├── preprocessing/           
│   ├── handle_pu.py                          # Handles oligomers and polymer repeat units
│   ├── map_structure_hsp_to_main_dataset.py  # Maps molecular representations and HSPs to dataset
│   ├── drop_unknown_hsps.py                  # Drops entries with missing/unknown HSP values
│   └── assign_hsp.py                         # Assigns Hansen Solubility Parameters to structures
├── training/                
│   ├── all_factories.py                      # All necessary functions and operators 
│   ├── get_ood_split.py                      # Define OOD train/test splits
│   ├── get_ood_split_learning_curve.py       # OOD learning curve experiment
│   ├── imputation_normalization.py           # Imputation and normalization function
│   ├── learning_curve_utils.py               # Shared utilities for learning curves
│   ├── make_ood_learning_curve.py            # Make OOD learning curve results
│   ├── make_ood_prediction.py                # Make OOD predictions
│   ├── scoring.py                            # Evaluation metrics and cross validations
│   ├── train_structure_numerical_generalized.py  # Random seeds for reproducibility
│   ├── train_structure_numerical.py          # Train with both structural or/and numerical
│   ├── training_utils.py                     # Shared training helpers
│   ├── unrolling_utils.py                    # Unrolling utilities for molecular representations
├── visualization/    
│   ├── utils_uncertainty_calibration.py      # Calibration plots for uncertainty  
│   ├── visualization_setting.py              # Plot style/setting configs
│   ├── visualize_heatmap.py                  # Heatmap plotting
│   ├── visualize_IID_learning_curve.py       # Visualize IID learning curves
│   ├── visualize_ood_full_data.py            # Visualize full OOD dataset results
│   ├── visualize_ood_learning_curve.py       # Visualize OOD learning curves
│   └── visualize_predictions_truth.py        # Prediction vs truth Hex plots

datasets/                    
├── fingerprint/
│   └── structural_features.csv               # Molecular representation for mapping to dataset   
├── json_resources/
│   ├── block_copolymers.json                 # Block copolymer list to remove 
│   ├── canonicalized_name.json               # Canonicalized polymer naming references
│   ├── data_summary_monitor.json             # Dataset cleaning and summary tracking
│   └── name_to_canonicalization.json         # Name → canonical form lookup table 
├── raw/                                      # Raw curated datasets
│   ├── Polymer_Solution_Scattering_Dataset.xlsx   # Initial collected data
│   ├── polymer_without_hsp.csv                    # Dataset excluding Hansen solubility parameters
│   ├── pu_processed.csv                           # Processed polymer repeat units and oligomers
│   └── SMILES_to_BigSMILES_Conversion_wo_block_copolymer_with_HSPs.xlsx  
│       # SMILES and HSPs of polymers
│                      
training_dataset/
└── Rg data with clusters aging imputed.pkl   # Final cleaned dataset incl. imputed aging parameters & clusters

results/                                   
├── HPC history                            # Logs and history of HPC job submissions/runs
├── OOD_target_log Rg (nm)                 # Out-of-distribution prediction results for log Rg
└── target_log Rg (nm)                     # In-distribution prediction results for log Rg

Name		Name	Last commit message	Last commit date
Latest commit History 1,655 Commits
.vscode		.vscode
code_python		code_python
datasets/Validation datasets		datasets/Validation datasets
results		results
stan		stan
.gitignore		.gitignore
README.md		README.md
torch.yml		torch.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Setup

Repository Structure

How to cite

About

Uh oh!

Releases

Packages

Languages

ddomlab/GP_collab

Folders and files

Latest commit

History

Repository files navigation

Overview

Setup

Repository Structure

How to cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages