PepPFDPO

PepPFDPO: Multi-Objective Antimicrobial Peptide Generation using Pareto-Frontier Enhanced Direct Preference Optimization

Abstract

Antimicrobial peptide (AMP) drug discovery is challenged by simultaneously optimizing conflicting objectives, such as activity and toxicity. While generative protein language models (ProtLMs) offer potential, they often struggle to effectively achieve large-scale AMP data. In this study, we propose PepPFDPO, a novel framework for multi-objective AMP optimization. PepPFDPO fine-tunes a pretrained generative ProtLM (e.g. ProGen2) based on a parameter efficient fine-tuning via Low-Rank Adaptation (LoRA) and a novel Pareto-Frontier enhanced Direct Preference Optimization (PFDPO) strategy. Crucially, PFDPO incorporates the Pareto non-dominated concept into preference learning, enabling the model to generate sequences that optimally balance multiple competing objectives. Experimental results demonstrate that PepPFDPO exhibits superior performance compared to state-of-the-art methods across multiple functional properties and generative quality metrics. Therefore, PepPFDPO provides an effective paradigm for multi-objective optimization in generative models, significantly advancing therapeutic peptide discovery.

Model Structure

Usage Guide

1. Installation

Python (Version 3.8 or higher is recommended)

pip install -r requirements.txt

2. How to Run

Step 1: Fine-tune ProGen2 using LoRA_ProGen2.py in the Fituning folder.

Step 2: Use the code in the PFDPO_ProGen2 folder in the Fituning folder to fine-tune the ProGen2 after LoRA fine-tuning, replace the path in the code with your own, and then run the main.py file.

Step3：After training the model, you can run run.py in the generate folder to generate the sequence.

Step4: Use the code in the Metric folder to evaluate the lifecycle sequence and the code in the picture folder to visualize the results.

3. File and Directory Description

Fituning: Fine-tuning the protein language model.

generate: Generate sequences using the fine-tuned model.

Metric: Evaluate various metrics of the generated sequence.

picture: Visualize the results.

ProtPLM: Protein language models for fine-tuning.

results: Sequences generated in this experiment and the results.

srcdata: Data used to fine-tune the model.

requirements.txt: The environment required for the experiment depends on the experiment.

Citation

If you use this code or our model for your publication, please cite the original paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PepPFDPO

Abstract

Model Structure

Usage Guide

1. Installation

2. How to Run

3. File and Directory Description

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Finetuning		Finetuning
Metric		Metric
ProtPLM		ProtPLM
generate		generate
picture		picture
results		results
srcdata		srcdata
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PepPFDPO

Abstract

Model Structure

Usage Guide

1. Installation

2. How to Run

3. File and Directory Description

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages