Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

ESM-2 Protein Secondary Structure Fine-Tuning

Fine-tuning Meta's ESM-2 (650M parameters) for protein secondary structure prediction.

Overview

This job performs distributed fine-tuning of Meta's Evolutionary Scale Modeling (ESM-2, 650M parameters, 33 transformer layers) for the task of protein secondary structure prediction. Each training iteration loads a batch of protein sequences from an LMDB dataset, feeds them through the ESM-2 model, and predicts structural classes (helix, sheet, or coil) for each amino acid. The model is trained by computing cross-entropy loss against ground-truth labels and updating weights via Adam optimizer. Gradients are synchronized across both nodes after each forward-backward pass with NCCL using distributed all-reduce. The process adapts the pre-trained protein language model ESM-2 to the new prediction task, running for 10,850 steps per epoch over 10 epochs with a ~20 hours expected training time

Requirements

  • Python 3.8+
  • See requirements.txt for dependencies
  • GPU with 16GB+ VRAM recommended

Usage

clearly launch job.yaml

Alternatively try via:

python train.py