Skip to content

Commit 54fc604

Browse files
committed
Setup.py now includes necessary binaries to be included
+ Added Dockerfile
1 parent e0d758f commit 54fc604

File tree

9 files changed

+84
-18
lines changed

9 files changed

+84
-18
lines changed

.vscode/launch.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,26 @@
231231
]
232232
},
233233

234+
{ // Test on test set: Hybrid DCA-LLM ProSST
235+
"name": "Python: PyPEF hybrid/only-TS-zero-shot GREMLIN-DCA-ProSST avGFP",
236+
"type": "debugpy",
237+
"request": "launch",
238+
"env": {"PYTHONPATH": "${workspaceFolder}"},
239+
"program": "${workspaceFolder}/pypef/main.py",
240+
"console": "integratedTerminal",
241+
"justMyCode": true,
242+
"cwd": "${workspaceFolder}/datasets/AVGFP/",
243+
"args": [
244+
"hybrid",
245+
//"-m", "GREMLIN", // optional, not required
246+
"--ts", "TS.fasl",
247+
"--params", "GREMLIN",
248+
"--llm", "prosst",
249+
"--wt", "P42212_F64L.fasta",
250+
"--pdb", "GFP_AEQVI.pdb"
251+
]
252+
},
253+
234254
{
235255
"name": "Python: PyPEF hybrid/only-PS-zero-shot GREMLIN-DCA avGFP",
236256
"type": "debugpy",

Dockerfile

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
FROM python:3.12-slim
2+
3+
WORKDIR /app
4+
RUN mkdir -p pypef
5+
6+
COPY requirements.txt run.py /app/
7+
COPY pypef/ /app/pypef/
8+
9+
RUN pip install --upgrade pip
10+
RUN pip install --no-cache-dir -r requirements.txt
11+
RUN ["python", "-c", "import torch;print(torch.__version__)"]
12+
13+
EXPOSE 5000
14+
15+
# Not defining entrypoint herein for eased chaining of multiple commands
16+
# with /bin/bash -c "command1 && command2..."
17+
#ENTRYPOINT ["python", "/app/run.py"]

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
## Table of Contents
22
[PyPEF: Pythonic Protein Engineering Framework](#pypef-pythonic-protein-engineering-framework)
33
- [Quick Installation](#quick-installation)
4+
- [Setup and Run Docker Image](#setup-and-run-docker-image)
45
- [GUI Installation](#gui-installation)
56
- [Requirements](#requirements)
67
- [Running Examples](#running-examples)
@@ -67,6 +68,24 @@ pypef --help
6768
The detailed routine for setting up a new virtual environment with Anaconda, installing the necessary Python packages for that environment, and running the Jupyter notebook tutorial can be found below in the Tutorial section.
6869
A quick file setup and run test can be performed running files in [scripts/Setup](scripts/Setup) containing a Batch script for Windows and a Bash script for Linux (the latter requires conda, i.e. Miniconda3 or Anaconda3, already being installed).
6970

71+
72+
<a name="docker-installation"></a>
73+
### Setup and Run Docker Image
74+
75+
Build the image using the stored [Dockerfile](./Dockerfile)
76+
```bash
77+
docker build -t pypef . # --progress=plain --no-cache
78+
```
79+
80+
A chained container command using the built Docker image can be run with e.g.:
81+
```bash
82+
docker run --gpus=all -v ./datasets/:/datasets --workdir /datasets/AVGFP pypef /bin/bash -c \
83+
"python /app/run.py mklsts --wt P42212_F64L.fasta --input avGFP.csv --ls_proportion 0.01 && \
84+
python /app/run.py hybrid --ls LS.fasl --ts TS.fasl --params GREMLIN --llm prosst --wt P42212_F64L.fasta --pdb GFP_AEQVI.pdb"
85+
86+
```
87+
88+
7089
<a name="gui-installation"></a>
7190
### GUI Installation
7291

build_with_pyinstaller.bat

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,24 @@
11
REM Up to now pastes DLLs from local Python environment bin's to _internal...
2-
REM alternative?: set PATH=%PATH%;%USERPROFILE%\miniconda3\envs\py312\Library\bin\;
2+
REM alternative?: set PATH=%PATH%;%USERPROFILE%\miniconda3\envs\pypef\Library\bin\;
33
pip install -r requirements.txt
44
pip install -U pyinstaller pyside6
55
pip install -e .
6+
set PATH=%PATH%;%USERPROFILE%\miniconda3\Scripts
67
pyinstaller^
78
--console^
89
--noconfirm^
10+
--collect-data pypef^
11+
--collect-all pypef^
912
--collect-data torch^
1013
--collect-data biotite^
1114
--collect-all biotite^
1215
--collect-data torch_geometric^
1316
--collect-all torch_geometric^
1417
--hidden-import torch_geometric^
15-
--add-binary=%USERPROFILE%\miniconda3\envs\py312\Library\bin\onedal_thread.3.dll:.^
16-
--add-binary=%USERPROFILE%\miniconda3\envs\py312\Library\bin\tbbbind.dll:.^
17-
--add-binary=%USERPROFILE%\miniconda3\envs\py312\Library\bin\tbbbind_2_0.dll:.^
18-
--add-binary=%USERPROFILE%\miniconda3\envs\py312\Library\bin\tbbbind_2_5.dll:.^
19-
--add-binary=%USERPROFILE%\miniconda3\envs\py312\Library\bin\tbbmalloc.dll:.^
20-
--add-binary=%USERPROFILE%\miniconda3\envs\py312\Library\bin\tbbmalloc_proxy.dll:.^
18+
--add-binary=%USERPROFILE%\miniconda3\envs\pypef\Library\bin\onedal_thread.3.dll:.^
19+
--add-binary=%USERPROFILE%\miniconda3\envs\pypef\Library\bin\tbbbind.dll:.^
20+
--add-binary=%USERPROFILE%\miniconda3\envs\pypef\Library\bin\tbbbind_2_0.dll:.^
21+
--add-binary=%USERPROFILE%\miniconda3\envs\pypef\Library\bin\tbbbind_2_5.dll:.^
22+
--add-binary=%USERPROFILE%\miniconda3\envs\pypef\Library\bin\tbbmalloc.dll:.^
23+
--add-binary=%USERPROFILE%\miniconda3\envs\pypef\Library\bin\tbbmalloc_proxy.dll:.^
2124
gui\PyPEFGUIQtWindow.py

build_with_pyinstaller.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ pip install -e .
55
pyinstaller \
66
--console \
77
--noconfirm \
8+
--collect-data pypef \
9+
--collect-all pypef \
810
--collect-data torch \
911
--collect-data biotite \
1012
--collect-all biotite \

pypef/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@
1212
# Journal of Chemical Information and Modeling, 2021, 61, 3463-3476
1313
# https://doi.org/10.1021/acs.jcim.1c00099
1414

15-
__version__ = '0.4.1'
15+
__version__ = '0.4.2'

pypef/hybrid/hybrid_model.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -396,11 +396,9 @@ def train_llm(self):
396396
# LoRA training on y_llm_ttrain --> Testing on y_llm_ttest
397397
x_llm_ttrain_b, scores_ttrain_b = (
398398
get_batches(self.x_llm_ttrain, batch_size=self.batch_size, dtype=int),
399-
#get_batches(self.attn_llm_ttrain, batch_size=self.batch_size, dtype=int),
400399
get_batches(self.y_ttrain, batch_size=self.batch_size, dtype=float)
401400
)
402401

403-
#x_llm_ttest_b = get_batches(self.x_llm_ttest, batch_size=self.batch_size, dtype=int)
404402
if self.llm_key == 'prosst':
405403
y_llm_ttest = self.llm_inference_function(
406404
xs=self.x_llm_ttest,
@@ -457,8 +455,7 @@ def train_llm(self):
457455
self.llm_attention_mask,
458456
self.structure_input_ids,
459457
n_epochs=50,
460-
device=self.device,
461-
#seed=self.seed
458+
device=self.device
462459
)
463460
y_llm_lora_ttrain = self.llm_inference_function(
464461
xs=self.x_llm_ttrain,
@@ -486,8 +483,7 @@ def train_llm(self):
486483
self.llm_model,
487484
self.llm_optimizer,
488485
n_epochs=5,
489-
device=self.device,
490-
#seed=self.seed
486+
device=self.device
491487
)
492488
y_llm_lora_ttrain = self.llm_inference_function(
493489
xs=x_llm_ttrain_b,

scripts/ProteinGym_runs/README.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
## Benchmark runs on publicly available ProteinGym protein variant sequence-fitness datasets
22

3-
Data is taken (script-based download) from "DMS Assays"-->"Substitutions" and "Multiple Sequence Alignments"-->"DMS Assays" data from https://proteingym.org/download.
4-
Run the following to download and extract the ProteinGym data and subsequently to get the predictions/the performance on those datasets.
5-
Based on available GPU/VRAM, variable `MAX_WT_SEQUENCE_LENGTH` in script [run_performance_tests_proteingym_hybrid_dca_llm.py](run_performance_tests_proteingym_hybrid_dca_llm.py) has to adjusted according to available (V)RAM. E.g., results ([results/dca_esm_and_hybrid_opt_results.csv](results/dca_esm_and_hybrid_opt_results.csv), graphically presented on the main page README) were computed with an NVIDIA GeForce RTX 5090 with 32 GB VRAM and setting `MAX_WT_SEQUENCE_LENGTH` to 1000 (GPU power limit set to 520 W):
3+
Data is taken (script-based download) from
4+
5+
"DMS Assays"-->"Substitutions" and "Multiple Sequence Alignments"-->"DMS Assays" data
6+
7+
from https://proteingym.org/download.
8+
9+
Perform the following steps to download and extract the ProteinGym data and then obtain the predictions/performance for these datasets.
10+
Depending on the available GPU/VRAM, the variable `MAX_WT_SEQUENCE_LENGTH` in the script [run_performance_tests_proteingym_hybrid_dca_llm.py](run_performance_tests_proteingym_hybrid_dca_llm.py) must be adjusted according to the available (V)RAM. For example, the results ([results/dca_esm_and_hybrid_opt_results.csv](results/dca_esm_and_hybrid_opt_results.csv), shown graphically on the main README page) were calculated with an NVIDIA GeForce RTX 5090 with 32 GB VRAM and the setting `MAX_WT_SEQUENCE_LENGTH = 1000` (GPU power limit set to 520 W):
611

712
```sh
813
#python -m pip install -r ../../requirements.txt

setup.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,11 @@
3939
url='https://github.com/niklases/PyPEF',
4040
py_modules=['pypef'],
4141
packages=find_packages(include=['pypef', 'pypef.*']),
42-
package_data={'pypef': ['ml/AAindex/*', 'ml/AAindex/Refined_cluster_indices_r0.93_r0.97/*']},
42+
package_data={'pypef': [
43+
'ml/AAindex/*',
44+
'ml/AAindex/Refined_cluster_indices_r0.93_r0.97/*',
45+
'llm/prosst_structure/static/*'
46+
]},
4347
include_package_data=True,
4448
install_requires=[cleaned_requirements],
4549
python_requires='>= 3.10, < 3.13',

0 commit comments

Comments
 (0)