Update REAMDE MDs

niklases · niklases · commit 459e5f4f21bf · 2025-05-26T21:31:54.000+02:00
diff --git a/README.md b/README.md
@@ -1,6 +1,7 @@
 ## Table of Contents
 [PyPEF: Pythonic Protein Engineering Framework](#pypef-pythonic-protein-engineering-framework)
   - [Quick Installation](#quick-installation)
+    - [Setup and Run Docker Image](#setup-and-run-docker-image)
     - [GUI Installation](#gui-installation)
   - [Requirements](#requirements)
   - [Running Examples](#running-examples)
@@ -67,15 +68,17 @@ pypef --help
 The detailed routine for setting up a new virtual environment with Anaconda, installing the necessary Python packages for that environment, and running the Jupyter notebook tutorial can be found below in the Tutorial section.
 A quick file setup and run test can be performed running files in [scripts/Setup](scripts/Setup) containing a Batch script for Windows and a Bash script for Linux (the latter requires conda, i.e. Miniconda3 or Anaconda3, already being installed).
 
-### Setup and Run Docker Image 
+
+<a name="docker-installation"></a>
+### Setup and Run Docker Image
 
 Build the image using the stored [Dockerfile](./Dockerfile)
 ```bash
 docker build -t pypef . # --progress=plain --no-cache
 ```
 
 A chained container command using the built Docker image can be run with e.g.:
-```
+```bash
 docker run --gpus=all -v ./datasets/:/datasets --workdir /datasets/AVGFP pypef /bin/bash -c \
     "python /app/run.py mklsts --wt P42212_F64L.fasta --input avGFP.csv --ls_proportion 0.01 &&  \
      python /app/run.py hybrid --ls LS.fasl --ts TS.fasl --params GREMLIN --llm prosst --wt P42212_F64L.fasta --pdb GFP_AEQVI.pdb"
diff --git a/scripts/ProteinGym_runs/README.md b/scripts/ProteinGym_runs/README.md
@@ -1,8 +1,13 @@
 ## Benchmark runs on publicly available ProteinGym protein variant sequence-fitness datasets
 
-Data is taken (script-based download) from "DMS Assays"-->"Substitutions" and "Multiple Sequence Alignments"-->"DMS Assays" data from https://proteingym.org/download.
-Run the following to download and extract the ProteinGym data and subsequently to get the predictions/the performance on those datasets.
-Based on available GPU/VRAM, variable `MAX_WT_SEQUENCE_LENGTH` in script [run_performance_tests_proteingym_hybrid_dca_llm.py](run_performance_tests_proteingym_hybrid_dca_llm.py) has to adjusted according to available (V)RAM. E.g., results ([results/dca_esm_and_hybrid_opt_results.csv](results/dca_esm_and_hybrid_opt_results.csv), graphically presented on the main page README) were computed with an NVIDIA GeForce RTX 5090 with 32 GB VRAM and setting `MAX_WT_SEQUENCE_LENGTH` to 1000 (GPU power limit set to 520 W):
+Data is taken (script-based download) from 
+
+"DMS Assays"-->"Substitutions" and "Multiple Sequence Alignments"-->"DMS Assays" data 
+
+from https://proteingym.org/download.
+
+Perform the following steps to download and extract the ProteinGym data and then obtain the predictions/performance for these datasets.
+Depending on the available GPU/VRAM, the variable `MAX_WT_SEQUENCE_LENGTH` in the script [run_performance_tests_proteingym_hybrid_dca_llm.py](run_performance_tests_proteingym_hybrid_dca_llm.py) must be adjusted according to the available (V)RAM. For example, the results ([results/dca_esm_and_hybrid_opt_results.csv](results/dca_esm_and_hybrid_opt_results.csv), shown graphically on the main README page) were calculated with an NVIDIA GeForce RTX 5090 with 32 GB VRAM and the setting `MAX_WT_SEQUENCE_LENGTH = 1000` (GPU power limit set to 520 W):
 
 ```sh
 #python -m pip install -r ../../requirements.txt