diff --git a/docs/docs/assets/images/esm2/esm2_peft_memory_usage.png b/docs/docs/assets/images/esm2/esm2_peft_memory_usage.png new file mode 100644 index 000000000..bf4b57cd9 Binary files /dev/null and b/docs/docs/assets/images/esm2/esm2_peft_memory_usage.png differ diff --git a/docs/docs/assets/images/esm2/esm2_peft_time.png b/docs/docs/assets/images/esm2/esm2_peft_time.png new file mode 100644 index 000000000..c21a1a8fa Binary files /dev/null and b/docs/docs/assets/images/esm2/esm2_peft_time.png differ diff --git a/docs/docs/models/ESM-2/index.md b/docs/docs/models/ESM-2/index.md index 660b00111..e854f2429 100644 --- a/docs/docs/models/ESM-2/index.md +++ b/docs/docs/models/ESM-2/index.md @@ -141,3 +141,19 @@ nodes. **Note:* 15B model variants were trained on 64 GPUs with the B Training ESM-3B on 256 NVIDIA A100s on 32 nodes achieved 96.85% of the theoretical linear throughput expected from extrapolating single-node (8 GPU) performance, representing a model flops utilization of 60.6% at 256 devices. + +### LoRA Fine-tuning Performace + +Fine-tuning ESM-3B and ESM-650M with LoRA achieves improvements in GPU utilization and training time over fine-tuning a full ESM2 model. In models with LoRA, the encoder and embedding layers are replaced with LoRA modules. + +#### LoRA GPU Memory Usage + +GPU memory usage decreases by a factor of 2.5 - 4 in a model fine-tuned with LoRA. + +![ESM2 Memory Usage](../../assets/images/esm2/esm2_peft_memory_usage.png) + +#### LoRA Scaling + +The number of tokens processed per second increases by 25-80%. + +![ESM2 Memory Usage](../../assets/images/esm2/esm2_peft_time.png)