Skip to content

Exploring the intersection of Ancestral Sequence Reconstruction, Protein Language Models, and Consensus Bias

License

Notifications You must be signed in to change notification settings

Arcadia-Science/2025-asr-plms

Repository files navigation

Do protein language models understand evolution? Mixed evidence from ancestral sequences and ESM2.

This code repository contains or points to all materials required for creating and hosting the publication entitled, "Do protein language models understand evolution? Mixed evidence from ancestral sequences and ESM2.".

The publication is hosted at this URL.

Data Description

Ancestral Sequence Generation

Ancestral sequences were reconstructed using the workflow in ASR/ASR_notebook.ipynb. All sequence inputs and outputs are in the ASR/ directory.

ESM2 Pseudo-perplexity Calculation

ESM2 pseudo-perplexity scores were calculated using ESM2_scoring/esm2_pppl_calculator.py on a GPU-enable AWS EC2 instance. All inputs and outputs are in the ESM2_scoring/ directory.

Reproduce

Please see SETUP.qmd.

Contribute

Please see CONTRIBUTING.qmd.

About

Exploring the intersection of Ancestral Sequence Reconstruction, Protein Language Models, and Consensus Bias

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •