AI-Powered Drug Discovery Pipeline

A production-ready transformer-based molecular generation and property prediction system for drug discovery, featuring reward-guided sampling, scaffold conditioning, and multi-task learning.

🚀 Overview

This project implements a complete AI-driven drug discovery pipeline that generates novel, drug-like molecules using transformer-based language models and multi-task property prediction. The system combines:

Generative Models: GRU and Transformer language models trained on SMILES strings
Property Predictors: Multi-task transformer for pIC50, logP, and QED prediction
Reward-Guided Search: Active filtering mechanism for quality-focused generation
Scaffold Conditioning: Targeted exploration of chemical space around seed structures

Key Statistics

Dataset: 15,037 validated drug-like molecules
Vocabulary Size: 36 chemical tokens
Model Parameters: ~4.3M (language model), ~3.2M (predictors)
Generation Quality: 70-85% valid SMILES, 80%+ drug-likeness (QED ≥ 0.5)
Novelty: 60-70% of generated molecules not in training set

✨ Features

Multiple Sampling Strategies (temperature, beam, top-k, nucleus)
Conditioning modes (scaffold, property-guided, unconditioned)
RDKit validation, synthesizability scoring, diversity metrics

🏗️ Pipeline Architecture

INPUT → GENERATION MODELS (GRU / Transformer) → REWARD-GUIDED SEARCH → PROPERTY PREDICTION → RANKING & FILTERING → OUTPUT

📦 Installation

This notebook is hosted on Kaggle where required packages are available in the environment. To run on Kaggle, open the notebook and "Run" — no requirements.txt is necessary.

If you need to run locally, create a virtual environment and install the minimal packages used in the notebook (example):

python -m venv venv
source venv/bin/activate
pip install torch gradio rdkit pandas numpy scikit-learn matplotlib tqdm

🎬 Quick Start

from model import sample_smiles
smiles = sample_smiles(lm_model, max_len=150, temperature=0.8)

🔧 Core Components

Tokenization, encoding/decoding utilities
Transformer-based language model and multi-task predictors
Reward and filtering utilities using RDKit

⚙️ Generation Modes

Mode 1: Generate & Filter (fast)
Mode 2: Reward-Guided Search (quality-focused)
Mode 3: Scaffold-Conditioned (targeted)

🏛️ Model Architecture

Transformer LM (causal) and shared encoder multi-task predictor.

🏋️ Training

Hyperparameters and training loop examples are provided in the notebook.

📊 Evaluation Metrics

SMILES validity, QED, novelty, diversity, prediction MAE/RMSE

📈 Results

Top generated molecules, diversity analysis, and temperature sweep summaries are in the notebook.

⚙️ Configuration

Pipeline configuration dataclass and examples are available in the code.

🏭 Production Features

Logging, checkpointing, memory management, and export utilities.

👤 Authors

Aram Elheni
Youssef Jaziri
Chaima Ben Yedder
Zied Knani

License

MIT

Made with ❤️ for drug discovery

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai-powered-drug-discovery-pipeline.ipynb		ai-powered-drug-discovery-pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Drug Discovery Pipeline

🚀 Overview

Key Statistics

✨ Features

🏗️ Pipeline Architecture

📦 Installation

🎬 Quick Start

🔧 Core Components

⚙️ Generation Modes

🏛️ Model Architecture

🏋️ Training

📊 Evaluation Metrics

📈 Results

⚙️ Configuration

🏭 Production Features

👤 Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Drug Discovery Pipeline

🚀 Overview

Key Statistics

✨ Features

🏗️ Pipeline Architecture

📦 Installation

🎬 Quick Start

🔧 Core Components

⚙️ Generation Modes

🏛️ Model Architecture

🏋️ Training

📊 Evaluation Metrics

📈 Results

⚙️ Configuration

🏭 Production Features

👤 Authors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages