Skip to content

Commit f7f1f9d

Browse files
committed
Clone niklases/PyPEF dev Commit f3812b5
1 parent 4258ed4 commit f7f1f9d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+75711
-254
lines changed
5.26 MB
Loading
767 KB
Loading

.github/workflows/build.yml

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,12 @@ permissions:
99
contents: read
1010

1111
jobs:
12-
build:
13-
12+
ubuntu:
13+
name: ubuntu
1414
runs-on: [ubuntu-latest]
1515
strategy:
1616
matrix:
17-
python-version: ["3.9", "3.10", "3.11", "3.12"]
18-
17+
python-version: ["3.10", "3.11", "3.12"]
1918
steps:
2019
- uses: actions/checkout@v4
2120
- name: Set up Python ${{ matrix.python-version }}
@@ -37,3 +36,32 @@ jobs:
3736
- name: Export Pythonpath and run PyPEF API and CLI version test with pytest
3837
run: |
3938
export PYTHONPATH="${PYTHONPATH}:${PWD}" && python -m pytest tests/
39+
40+
windows:
41+
name: windows
42+
runs-on: [windows-latest]
43+
strategy:
44+
matrix:
45+
python-version: ["3.10", "3.11", "3.12"]
46+
steps:
47+
- uses: actions/checkout@v4
48+
- name: Set up Python ${{ matrix.python-version }}
49+
uses: actions/setup-python@v5
50+
with:
51+
python-version: ${{ matrix.python-version }}
52+
- name: Display Path and Python version
53+
run: |
54+
python -c "import sys, platform; print(sys.version, platform.system())"
55+
- name: Install dependencies
56+
run: |
57+
python -m pip install --upgrade pip
58+
pip install flake8 pytest
59+
pip install -r requirements.txt
60+
- name: Lint with flake8
61+
run: |
62+
# stop the build if there are Python syntax errors or undefined names
63+
flake8 .\pypef --count --select=E9,F63,F7,F82 --show-source --statistics
64+
- name: Export Pythonpath and run PyPEF API and CLI version test with pytest
65+
shell: pwsh
66+
run: |
67+
$env:PYTHONPATH = "${PWD};${env:PYTHONPATH}";python -m pytest .\tests\

.gitignore

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ scripts/ProteinGym_runs/single_point_mut_performance.png
2727
scripts/ProteinGym_runs/multi_point_mut_performance.png
2828

2929
# Created test/output files
30+
model_saves/*
3031
scripts/Setup/windows/Miniconda3-latest-Windows-x86_64.exe
3132
scripts/Setup/windows/Miniconda3/*
3233
scripts/Encoding_low_N/apc.png
@@ -402,3 +403,27 @@ scripts/Runtime_tests/runtimes.png
402403
datasets/AVGFP/Recomb_Double_Split/Predictions_Hybrid_TopRecomb_Double_Split.txt
403404
scripts/ProteinGym_runs/single_point_mut_performance_violin.png
404405
scripts/ProteinGym_runs/multi_point_mut_performance_violin.png
406+
scripts/ESM_finetuning/DMS_msa_files/
407+
scripts/ESM_finetuning/DMS_ProteinGym_substitutions/
408+
scripts/ESM_finetuning/ProteinGym_AF2_structures/
409+
410+
scripts/ESM_finetuning/higher_point_dms_mut_data.json
411+
scripts/ESM_finetuning/single_point_dms_mut_data.json
412+
scripts/ESM_finetuning/results/dca_esm_and_hybrid_opt_results_clean.csv
413+
scripts/ESM_finetuning/results/dca_esm_and_hybrid_opt_results.csv
414+
scripts/ESM_finetuning/mut_performance.png
415+
scripts/ESM_finetuning/_Description_DMS_substitutions_data.csv
416+
scripts/ESM_finetuning/mut_performance_violin.png
417+
datasets/ANEH/SSM_landscape.png
418+
datasets/ANEH/SSM_landscape.csv
419+
datasets/AVGFP/model_saves/*
420+
datasets/AVGFP/Pickles/*
421+
datasets/AVGFP/DCA_Hybrid_Model_Performance_ESM1v_no_ML.png
422+
datasets/AVGFP/DCA_Hybrid_Model_Performance_ProSST_no_ML.png
423+
424+
# Large files // LFS in niklases/PyPEF
425+
datasets/ANEH/ANEH_72.6.params
426+
datasets/AVGFP/uref100_avgfp_jhmmer_119_plmc_42.6.params
427+
datasets/AVGFP/uref100_avgfp_jhmmer_119.sto
428+
datasets/GRB2/GRB2_HUMAN_full_11-26-2021_b05.a2m
429+
datasets/ANEH/ANEH_jhmmer.sto

.vscode/launch.json

Lines changed: 94 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,46 @@
123123
},
124124

125125
{
126+
"name": "Python: PyPEF hybrid LS-TS GREMLIN-DCA-ESM1v avGFP",
127+
"type": "debugpy",
128+
"request": "launch",
129+
"env": {"PYTHONPATH": "${workspaceFolder}"},
130+
"program": "${workspaceFolder}/pypef/main.py",
131+
"console": "integratedTerminal",
132+
"justMyCode": true,
133+
"cwd": "${workspaceFolder}/datasets/AVGFP/",
134+
"args": [
135+
"hybrid",
136+
//"-m", "GREMLIN", // optional, not required
137+
"--ls", "LS.fasl",
138+
"--ts", "TS.fasl",
139+
"--params", "GREMLIN",
140+
"--llm", "esm"
141+
]
142+
},
143+
144+
{
145+
"name": "Python: PyPEF hybrid LS-TS GREMLIN-DCA-ProSST avGFP",
146+
"type": "debugpy",
147+
"request": "launch",
148+
"env": {"PYTHONPATH": "${workspaceFolder}"},
149+
"program": "${workspaceFolder}/pypef/main.py",
150+
"console": "integratedTerminal",
151+
"justMyCode": true,
152+
"cwd": "${workspaceFolder}/datasets/AVGFP/",
153+
"args": [
154+
"hybrid",
155+
//"-m", "GREMLIN", // optional, not required
156+
"--ls", "LS.fasl",
157+
"--ts", "TS.fasl",
158+
"--params", "GREMLIN",
159+
"--llm", "prosst",
160+
"--wt", "P42212_F64L.fasta",
161+
"--pdb", "GFP_AEQVI.pdb"
162+
]
163+
},
164+
165+
{ // Test on test set
126166
"name": "Python: PyPEF hybrid/only-TS-zero-shot GREMLIN-DCA avGFP",
127167
"type": "debugpy",
128168
"request": "launch",
@@ -139,6 +179,24 @@
139179
]
140180
},
141181

182+
{ // Test on test set: Hybrid DCA-LLM ESM1v
183+
"name": "Python: PyPEF hybrid/only-TS-zero-shot GREMLIN-DCA-ESM1v avGFP",
184+
"type": "debugpy",
185+
"request": "launch",
186+
"env": {"PYTHONPATH": "${workspaceFolder}"},
187+
"program": "${workspaceFolder}/pypef/main.py",
188+
"console": "integratedTerminal",
189+
"justMyCode": true,
190+
"cwd": "${workspaceFolder}/datasets/AVGFP/",
191+
"args": [
192+
"hybrid",
193+
//"-m", "GREMLIN", // optional, not required
194+
"--ts", "TS.fasl",
195+
"--params", "GREMLIN",
196+
"--llm", "esm"
197+
]
198+
},
199+
142200
{
143201
"name": "Python: PyPEF hybrid/only-PS-zero-shot GREMLIN-DCA avGFP",
144202
"type": "debugpy",
@@ -156,6 +214,23 @@
156214
]
157215
},
158216

217+
{
218+
"name": "Python: PyPEF hybrid/only-PS-zero-shot GREMLIN-DCA avGFP drecomb",
219+
"type": "debugpy",
220+
"request": "launch",
221+
"env": {"PYTHONPATH": "${workspaceFolder}"},
222+
"program": "${workspaceFolder}/pypef/main.py",
223+
"console": "integratedTerminal",
224+
"justMyCode": true,
225+
"cwd": "${workspaceFolder}/datasets/AVGFP/",
226+
"args": [
227+
"hybrid",
228+
"-m", "GREMLIN",
229+
"--pmult", "--drecomb",
230+
"--params", "GREMLIN"
231+
]
232+
},
233+
159234
{
160235
"name": "Python: PyPEF hybrid/only-PS-zero-shot GREMLIN-DCA avGFP drecomb II",
161236
"type": "debugpy",
@@ -174,7 +249,7 @@
174249
},
175250

176251
{
177-
"name": "Python: PyPEF hybrid/only-PS-zero-shot GREMLIN-DCA avGFP drecomb",
252+
"name": "Python: PyPEF hybrid/only-PS-zero-shot GREMLIN-DCA avGFP drecomb III: ESM",
178253
"type": "debugpy",
179254
"request": "launch",
180255
"env": {"PYTHONPATH": "${workspaceFolder}"},
@@ -184,7 +259,24 @@
184259
"cwd": "${workspaceFolder}/datasets/AVGFP/",
185260
"args": [
186261
"hybrid",
187-
"-m", "GREMLIN",
262+
"-m", "HYBRIDgremlinesm",
263+
"--pmult", "--drecomb",
264+
"--params", "GREMLIN"
265+
]
266+
},
267+
268+
{
269+
"name": "Python: PyPEF hybrid/only-PS-zero-shot GREMLIN-DCA avGFP drecomb IV: ProSST",
270+
"type": "debugpy",
271+
"request": "launch",
272+
"env": {"PYTHONPATH": "${workspaceFolder}"},
273+
"program": "${workspaceFolder}/pypef/main.py",
274+
"console": "integratedTerminal",
275+
"justMyCode": true,
276+
"cwd": "${workspaceFolder}/datasets/AVGFP/",
277+
"args": [
278+
"hybrid",
279+
"-m", "HYBRIDgremlinprosst",
188280
"--pmult", "--drecomb",
189281
"--params", "GREMLIN"
190282
]

README.md

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
# PyPEF: Pythonic Protein Engineering Framework
2323
[![PyPI version](https://img.shields.io/pypi/v/PyPEF?color=blue)](https://pypi.org/project/pypef/)
2424
[![Python version](https://img.shields.io/pypi/pyversions/PyPEF)](https://www.python.org/downloads/)
25-
[![Build](https://github.com/Protein-Engineering-Framework/PyPEF/actions/workflows/build.yml/badge.svg)](https://github.com/Protein-Engineering-Framework/PyPEF/actions?query=workflow:build)
25+
[![Build](https://github.com/niklases/PyPEF/actions/workflows/build.yml/badge.svg)](https://github.com/niklases/PyPEF/actions/?query=workflow:build)
2626
[![PyPI Downloads](https://static.pepy.tech/badge/pypef)](https://pepy.tech/projects/pypef)
2727

2828
a framework written in Python 3 for performing sequence-based machine learning-assisted protein engineering to predict a protein's fitness from its sequence using different forms of sequence encoding:
@@ -69,15 +69,15 @@ A rudimentary graphical user interface (GUI) can be installed using the gui_setu
6969

7070
Windows (PowerShell)
7171
```powershell
72-
Invoke-WebRequest https://raw.githubusercontent.com/Protein-Engineering-Framework/PyPEF/refs/heads/master/gui_setup.bat -OutFile gui_setup.bat
73-
Invoke-WebRequest https://raw.githubusercontent.com/Protein-Engineering-Framework/PyPEF/refs/heads/master/gui/qt_window.py -OutFile ( New-Item -Path ".\gui\qt_window.py" -Force )
72+
Invoke-WebRequest https://raw.githubusercontent.com/niklases/PyPEF/refs/heads/main/gui_setup.bat -OutFile gui_setup.bat
73+
Invoke-WebRequest https://raw.githubusercontent.com/niklases/PyPEF/refs/heads/main/gui/qt_window.py -OutFile ( New-Item -Path ".\gui\qt_window.py" -Force )
7474
.\gui_setup.bat
7575
```
7676

7777
Linux
7878
```bash
79-
wget https://raw.githubusercontent.com/Protein-Engineering-Framework/PyPEF/refs/heads/master/gui_setup.sh -O gui_setup.sh
80-
mkdir -p ./gui/ && wget https://raw.githubusercontent.com/Protein-Engineering-Framework/PyPEF/refs/heads/master/gui/qt_window.py -O ./gui/qt_window.py
79+
wget https://raw.githubusercontent.com/niklases/PyPEF/refs/heads/main/gui_setup.sh -O gui_setup.sh
80+
mkdir -p ./gui/ && wget https://raw.githubusercontent.com/niklases/PyPEF/refs/heads/main/gui/qt_window.py -O ./gui/qt_window.py
8181
chmod a+x ./gui_setup.sh && ./gui_setup.sh
8282
```
8383

@@ -218,8 +218,8 @@ bash Anaconda3-2023.03-1-Linux-x86_64.sh
218218
```
219219

220220
After accepting all steps, the conda setup should also be written to your `~/.bashrc`file, so that you can call anaconda typing `conda`.
221-
Next, to download this repository click Code > Download ZIP and unzip the zipped file, e.g. with `unzip PyPEF-master.zip`, or just clone this repository using your bash shell to your local machine `git clone https://github.com/Protein-Engineering-Framework/PyPEF`.
222-
To set up a new environment with conda you can either create the conda environment from the provided YAML file inside the PyPEF directory (`cd PyPEF` or `cd PyPEF-master` dependent on the downloaded file name and chose YAML file for your operating system):
221+
Next, to download this repository click Code > Download ZIP and unzip the zipped file, e.g. with `unzip PyPEF-main.zip`, or just clone this repository using your bash shell to your local machine `git clone https://github.com/niklases/PyPEF`.
222+
To set up a new environment with conda you can either create the conda environment from the provided YAML file inside the PyPEF directory (`cd PyPEF` or `cd PyPEF-main` dependent on the downloaded file name and chose YAML file for your operating system):
223223

224224
```
225225
conda env create --file linux_env.yml
@@ -237,7 +237,7 @@ To activate the environment you can define:
237237
conda activate pypef
238238
```
239239

240-
After activating the environment you can install required packages after changing the directory to the PyPEF directory (`cd PyPEF` or `cd PyPEF-master`) and install required packages with pip if you did not use the YAML file for creating the environment (if using conda, packages will be installed in anaconda3/envs/pypef/lib/python3.10/site-packages):
240+
After activating the environment you can install required packages after changing the directory to the PyPEF directory (`cd PyPEF` or `cd PyPEF-main`) and install required packages with pip if you did not use the YAML file for creating the environment (if using conda, packages will be installed in anaconda3/envs/pypef/lib/python3.10/site-packages):
241241

242242
```
243243
python3 -m pip install -r requirements.txt
@@ -327,23 +327,23 @@ The following model hyperparameter ranges are tested during (*k*-fold) cross-val
327327
PyPEF was developed to be run from a command-line interface while `python3 ./pypef/main.py` (when using the downloaded version of this repository and setting the `PYTHONPATH`) is equal to `pypef` when installed with pip.
328328
Downloading/cloning the repository files (manually or with `wget`/`git clone`):<br>
329329
```
330-
wget https://github.com/Protein-Engineering-Framework/PyPEF/archive/refs/heads/master.zip
330+
wget https://github.com/niklases/PyPEF/archive/main.zip
331331
```
332332

333333
Unzipping the zipped file (manually or e.g. with `unzip`):
334334
```
335-
unzip master.zip
335+
unzip main.zip
336336
```
337337

338338
Setting the `PYTHONPATH` (so that no import errors occur stating that the package `pypef` and thus dependent absolute imports are unknown):<br>
339339
&nbsp;&nbsp;Windows (example path, PowerShell)
340340
```
341-
$env:PYTHONPATH="C:\Users\name\path\to\PyPEF-master"
341+
$env:PYTHONPATH="C:\Users\name\path\to\PyPEF-main"
342342
```
343343

344344
&nbsp;&nbsp;Linux (example path)
345345
```
346-
export PYTHONPATH="${PYTHONPATH}:/home/name/path/to/PyPEF-master"
346+
export PYTHONPATH="${PYTHONPATH}:/home/name/path/to/PyPEF-main"
347347
```
348348
Installing the requirements:<br>
349349
&nbsp;&nbsp;Windows (PowerShell)
@@ -356,7 +356,7 @@ python -m pip install -r requirements.txt
356356
python3 -m pip install -r requirements.txt
357357
```
358358

359-
Running the main script (from PyPEF-master directory):<br>
359+
Running the main script (from PyPEF-main directory):<br>
360360
&nbsp;&nbsp;Windows (PowerShell)
361361
```
362362
python .\pypef\main.py
@@ -485,6 +485,11 @@ The performance of the GREMLIN model used is shown in the following for predicti
485485
486486
for ProteinGym datasets computed using the scripts located at [scripts/ProteinGym_runs](scripts/ProteinGym_runs).
487487
488+
A hybrid GREMLIN-ESM1v low-N-tuned model achieved even increased performances compared to the pure DCA-tuned model (script available at [scripts/ESM_finetuning](scripts/ESM_finetuning))
489+
<p align="center">
490+
<img src=".github/imgs/mut_performance_violin_DCA_ESM.png" alt="drawing" width="250"/>
491+
</p>
492+
488493
<a name="api-usage"></a>
489494
## API Usage for Sequence Encoding
490495
For script-based encoding of sequences using PyPEF and the available AAindex-, OneHot- or DCA-based techniques, the classes and corresponding functions can be imported, i.e. `OneHotEncoding`, `AAIndexEncoding`, `GREMLIN` (DCA), `PLMC` (DCA), and `DCAHybridModel`. In addition, implemented functions for CV-based tuning of regression models can be used to train and validate models, eventually deriving them to obtain performances on retained data for testing. An exemplary script and a Jupyter notebook for CV-based (low-*N*) tuning of models and using them for testing is provided at [scripts/Encoding_low_N/api_encoding_train_test.py](scripts/Encoding_low_N/api_encoding_train_test.py) and [scripts/Encoding_low_N/api_encoding_train_test.ipynb](scripts/Encoding_low_N/api_encoding_train_test.ipynb), respectively.

0 commit comments

Comments
 (0)