Skip to content

scaleapi/BioRiskEval-os

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Boyi Wei1,2*† ,  Zora Che1,3*† ,  Nathaniel Li1† ,  Jasper Götting 4 ,  Samira Nedungadi4 ,  Julian Michael1† ,  Summer Yue1† ,  Dan Hendrycks5 ,  Peter Henderson2 ,  Zifan Wang1† ,  Seth Donoughe4 ,  Mantas Mazeika5  
*Equal Contribution    Work done while at Scale AI
1Scale AI    2Princeton University    3University of Maryland    4SecureBio    5Center for AI Safety

Paper | Blogpost | Twitter 

This repository is built on the top of the BioNeMo Framework. With two additional modules:

  1. bioriskeval/: An evaluation framework for assessing the dual-use risk of bio-foundation models.
  2. attack/: Contains scripts for fine-tuning. We included the scripts for probing in bioriskeval/vir and bioriskeval/mut.

Installation

Build the docker image and develop environment

Following the same steps in Getting Started with BioNeMo Framework, you can run the following scrips to clone the reposiotry and build the docker image:

Download the repository

git clone --recursive [email protected]:boyiwei/BioRiskEval.git
cd BioRiskEval

Download Dataset

Use the following script to download the BioRiskEval dataset:

cd bioriskeval
bash download_data.sh

You may need to first get access to the huggingface dataset before downloading. The script will download BioRiskEval-Gen into bioriskeval/gen/data/, BioRiskEval-Mut into bioriskeval/mut/data/, and BioRiskEval-Vir into bioriskeval/vir/data/. For BioRiskEval-Mut, we have two sets of data: DMS_ProteinGym_substitutions and DMS_Probe. DMS_ProteinGym_substitutions contains 16 DMS datasets collected from ProteinGym and is used for log-likelihood based evaluation. DMS_Probe is the dataset used for probe based evaluation. You can also generate DMS_Probe by running dms/probe/create_dms_probe_dataset.py.

Build the docker image

With a locally cloned repository and initialized submodules, build the container using:

docker buildx build . -t my-container-tag

VSCode Devcontainer for Interactive Debugging

We distribute a development container configuration for vscode (.devcontainer/devcontainer.json) that simplifies the process of local testing and development. Opening the BioRiskEval folder with VSCode should prompt you to re-open the folder inside the devcontainer environment.

Note

The first time you launch the devcontainer, it may take a long time to build the image. Building the image locally (using the command shown above) will ensure that most of the layers are present in the local docker cache.

We highly recommend to run the experiments on H100 GPUs.

Convert Checkpoints

After installation, BioNemo Framework needs to first convert the Evo2-Vortex checkpoint to NeMo2 checkpoint. This can be done by running the following script:

7b-1M

evo2_convert_to_nemo2 \
  --model-path hf://arcinstitute/savanna_evo2_7b \
  --model-size 7b_arc_longcontext --output-dir /your/checkpoint/dir/nemo2_evo2_7b_1m

40b-1M

evo2_convert_to_nemo2 \
  --model-path hf://arcinstitute/savanna_evo2_40b \
  --model-size 40b_arc_longcontext --output-dir /your/checkpoint/dir/nemo2_evo2_40b_1m

BioRiskEval

The hierarchy of BioRiskEval is:

  • BioRiskEval-Gen (bioriskeval/gen): Sequnece modeling evaluation. Metric: Perplexity
  • BioRiskEval-Mut (bioriskeval/mut): Mutational effect prediction evaluation. Metric: |Spearman correlation $\rho$|
  • BioRiskEval-Vir (bioriskeval/vir): Virulence prediction evaluation. Metric: Pearson correlation, $R^2$

BioRiskEval-Gen

The workflow of BioRiskEval-Gen is:

  1. Sample examples from human_host_df.csv, specify the species name, genus name, or family name.
  2. The script will gather the sampled accession ids and download the sequences from NCBI.
  3. eval_ppl.py will compute the perplexity on the downloaded sequences, and save the results in results/.

We have provided an example script bioriskeval/gen/bioriskeval_gen.sh for quick start.

BioRiskEval-Mut

The workflow of BioRiskEval-Mut under the zero-shot/loglikelihood setting is:

  1. Process protein sequences for DMS (Deep Mutational Scanning) into nucleotides with nucleotide_data_pipeline.py
  2. eval_fitness.py calculates log-likelihood based score for auto-regressive genomic models on mutational sequences, and Spearman correlation with the ground truth experimental fitness is reported for each DMS. eval_fitness_esm2.py calculates scoring with masked marginals for ESM2 protein models.

We provide an example script bioriskeval/mut/bioriskeval_mut_logprob.sh for quick start.

The workflow of BioRiskEval-Mut under the probe setting is:

  1. Pick $k$ numbers of mutations from each DMS to fit linear probes. Within the k mutations, 80% are used to fit and 20% are used as the validation split. Rest of the data is used as test split. create_dms_probe_dataset.py create the splits and saves representations for train and val splits.
  2. Sweep over all layers with sweep_dms_probe.py to find the best layer for fitting the linear probe. Best probe based on train RMSE or validation split spearman are saved.
  3. Save test representation based on the best layer with probe_layer_utils.py and create_dms_probe_dataset.py. Evaluate saved probes with test_dms_probe.py

We provide an example script bioriskeval/mut/bioriskeval_mut_probe.sh for quick start.

BioRiskEval-Vir

The workflow of BioRiskEval-Vir is:

  1. Extract hidden-layer representations, create train-test split (7:3) for probing.
  2. Train a linear probe on the train set, and evaluate on the test set. train_probe_continuous.py will train the linear probe and evaluate its performance on the test set. It will also uplaod the results to Weights & Biases and dumpe the results to a csv file.

We have provided an example script bioriskeval/vir/bioriskeval_vir.sh for quick start.

Fine-Tuning & Probing

Fine-tuning

Inside attack/, we have the scripts for fine-tuning.

The workflow of fine-tuning is:

  1. Have the csv file with accession ids in column #Accession
  2. Convert the csv file to fna file using convert_csv_to_fna.py
  3. Create the train-val split, tokenize the data
  4. Create dataset config for fine-tuning
  5. Fine-tune the model

We provide an example script in attack/data/preprocess_ft_data.sh (preprocess data, step 1-4) and attack/ft/launch_ft_7b_1m.sh (fine-tuning the model, step 5) for quick start, in which you can modify the csv file path and change the preprocess config.

Probing

The workflow of probing is:

  1. Extract the hidden-layer representations from the model
  2. Train a linear probe on the train set
  3. Evaluate the probe on the test set

Refer to the example scripts bioriskeval/vir/bioriskeval_vir.sh for quick start.

Reproducibility

We documented the results in attack/analysis/, which contains the raw results and scripts for analysis and plotting.

About

open source codebase for BioRiskEval

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published