Skip to content

Commit 1544975

Browse files
authored
Update pages
1 parent 490a769 commit 1544975

File tree

7 files changed

+831
-0
lines changed

7 files changed

+831
-0
lines changed

pages/deeplc/deeplc.md

Lines changed: 377 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,377 @@
1+
---
2+
title: "deeplc"
3+
project: "deeplc"
4+
github_project: "https://github.com/compomics/DeepLC"
5+
description: "DeepLC: Retention time prediction for (modified) peptides using Deep Learning."
6+
layout: default
7+
tags: project_home, deeplc
8+
permalink: /projects/deeplc
9+
---
10+
11+
<img src="https://github.com/compomics/DeepLC/raw/master/img/deeplc_logo.png"
12+
width="150" height="150" /> <br/><br/>
13+
14+
[![GitHub release](https://flat.badgen.net/github/release/compomics/deeplc)](https://github.com/compomics/DeepLC/releases/latest/)
15+
[![PyPI](https://flat.badgen.net/pypi/v/deeplc)](https://pypi.org/project/deeplc/)
16+
[![Conda](https://img.shields.io/conda/vn/bioconda/deeplc?style=flat-square)](https://bioconda.github.io/recipes/deeplc/README.html)
17+
[![GitHub Workflow Status](https://flat.badgen.net/github/checks/compomics/deeplc/)](/projects/deeplc/actions)
18+
[![License](https://flat.badgen.net/github/license/compomics/deeplc)](https://www.apache.org/licenses/LICENSE-2.0)
19+
[![Twitter](https://flat.badgen.net/twitter/follow/compomics?icon=twitter)](https://twitter.com/compomics)
20+
21+
DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
22+
23+
---
24+
25+
- [Introduction](#introduction)
26+
- [Citation](#citation)
27+
- [Usage](#usage)
28+
- [Web application](#web-application)
29+
- [Graphical user interface](#graphical-user-interface)
30+
- [Python package](#python-package)
31+
- [Installation](#installation)
32+
- [Command line interface](#command-line-interface)
33+
- [Python module](#python-module)
34+
- [Input files](#input-files)
35+
- [Prediction models](#prediction-models)
36+
- [Q&A](#qa)
37+
38+
---
39+
40+
## Introduction
41+
42+
DeepLC is a retention time predictor for (modified) peptides that employs Deep
43+
Learning. Its strength lies in the fact that it can accurately predict
44+
retention times for modified peptides, even if hasn't seen said modification
45+
during training.
46+
47+
DeepLC can be used through the
48+
[web application](https://iomics.ugent.be/deeplc/),
49+
locally with a graphical user interface (GUI), or as a Python package. In the
50+
latter case, DeepLC can be used from the command line, or as a Python module.
51+
52+
## Citation
53+
54+
If you use DeepLC for your research, please use the following citation:
55+
>**DeepLC can predict retention times for peptides that carry as-yet unseen modifications**
56+
>Robbin Bouwmeester, Ralf Gabriels, Niels Hulstaert, Lennart Martens & Sven Degroeve
57+
> Nature Methods 18, 1363–1369 (2021) [doi: 10.1038/s41592-021-01301-5](http://dx.doi.org/10.1038/s41592-021-01301-5)
58+
59+
## Usage
60+
61+
### Web application
62+
[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://iomics.ugent.be/deeplc/)
63+
64+
Just go to [iomics.ugent.be/deeplc](https://iomics.ugent.be/deeplc/) and get started!
65+
66+
67+
### Graphical user interface
68+
69+
#### In an existing Python environment (cross-platform)
70+
71+
1. In your terminal with Python (>=3.7) installed, run `pip install deeplc[gui]`
72+
2. Start the GUI with the command `deeplc-gui` or `python -m deeplc.gui`
73+
74+
#### Standalone installer (Windows)
75+
76+
[![Download GUI](https://flat.badgen.net/badge/download/GUI/blue)](https://github.com/compomics/DeepLC/releases/latest/)
77+
78+
79+
1. Download the DeepLC installer (`DeepLC-...-Windows-64bit.exe`) from the
80+
[latest release](https://github.com/compomics/DeepLC/releases/latest/)
81+
2. Execute the installer
82+
3. If Windows Smartscreen shows a popup window with "Windows protected your PC",
83+
click on "More info" and then on "Run anyway". You will have to trust us that
84+
DeepLC does not contain any viruses, or you can check the source code 😉
85+
4. Go through the installation steps
86+
5. Start DeepLC!
87+
88+
![GUI screenshot](https://github.com/compomics/DeepLC/raw/master/img/gui-screenshot.png)
89+
90+
91+
### Python package
92+
93+
#### Installation
94+
95+
[![install with bioconda](https://flat.badgen.net/badge/install%20with/bioconda/green)](http://bioconda.github.io/recipes/deeplc/README.html)
96+
[![install with pip](https://flat.badgen.net/badge/install%20with/pip/green)](http://bioconda.github.io/recipes/deeplc/README.html)
97+
[![container](https://flat.badgen.net/badge/pull/biocontainer/green)](https://quay.io/repository/biocontainers/deeplc)
98+
99+
Install with conda, using the bioconda and conda-forge channels:
100+
`conda install -c bioconda -c conda-forge deeplc`
101+
102+
Or install with pip:
103+
`pip install deeplc`
104+
105+
#### Command line interface
106+
107+
To use the DeepLC CLI, run:
108+
109+
```sh
110+
deeplc --file_pred <path/to/peptide_file.csv>
111+
```
112+
113+
We highly recommend to add a peptide file with known retention times for
114+
calibration:
115+
116+
```sh
117+
deeplc --file_pred <path/to/peptide_file.csv> --file_cal <path/to/peptide_file_with_tr.csv>
118+
```
119+
120+
For an overview of all CLI arguments, run `deeplc --help`.
121+
122+
#### Python module
123+
124+
Minimal example:
125+
126+
```python
127+
import pandas as pd
128+
from deeplc import DeepLC
129+
130+
peptide_file = "datasets/test_pred.csv"
131+
calibration_file = "datasets/test_train.csv"
132+
133+
pep_df = pd.read_csv(peptide_file, sep=",")
134+
pep_df['modifications'] = pep_df['modifications'].fillna("")
135+
136+
cal_df = pd.read_csv(calibration_file, sep=",")
137+
cal_df['modifications'] = cal_df['modifications'].fillna("")
138+
139+
dlc = DeepLC()
140+
dlc.calibrate_preds(seq_df=cal_df)
141+
preds = dlc.make_preds(seq_df=pep_df)
142+
```
143+
144+
Minimal example with psm_utils:
145+
146+
```python
147+
import pandas as pd
148+
149+
from psm_utils.psm import PSM
150+
from psm_utils.psm_list import PSMList
151+
from psm_utils.io import write_file
152+
153+
from deeplc import DeepLC
154+
155+
infile = pd.read_csv("https://github.com/compomics/DeepLC/files/13298024/231108_DeepLC_input-peptides.csv")
156+
psm_list = []
157+
158+
for idx,row in infile.iterrows():
159+
seq = row["modifications"].replace("(","[").replace(")","]")
160+
161+
if seq.startswith("["):
162+
idx_nterm = seq.index("]")
163+
seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]
164+
165+
psm_list.append(PSM(peptidoform=seq,spectrum_id=idx))
166+
167+
psm_list = PSMList(psm_list=psm_list)
168+
169+
infile = pd.read_csv("https://github.com/compomics/DeepLC/files/13298022/231108_DeepLC_input-calibration-file.csv")
170+
psm_list_calib = []
171+
172+
for idx,row in infile.iterrows():
173+
seq = row["seq"].replace("(","[").replace(")","]")
174+
175+
if seq.startswith("["):
176+
idx_nterm = seq.index("]")
177+
seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]
178+
179+
psm_list_calib.append(PSM(peptidoform=seq,retention_time=row["tr"],spectrum_id=idx))
180+
181+
psm_list_calib = PSMList(psm_list=psm_list_calib)
182+
183+
dlc = DeepLC()
184+
dlc.calibrate_preds(psm_list_calib)
185+
preds = dlc.make_preds(seq_df=psm_list)
186+
```
187+
188+
For a more elaborate example, see
189+
[examples/deeplc_example.py](https://github.com/compomics/DeepLC/blob/master/examples/deeplc_example.py)
190+
.
191+
192+
### Input files
193+
194+
DeepLC expects comma-separated values (CSV) with the following columns:
195+
196+
- `seq`: unmodified peptide sequences
197+
- `modifications`: MS2PIP-style formatted modifications: Every modification is
198+
listed as `location|name`, separated by a pipe (`|`) between the location, the
199+
name, and other modifications. `location` is an integer counted starting at 1
200+
for the first AA. 0 is reserved for N-terminal modifications, -1 for
201+
C-terminal modifications. `name` has to correspond to a Unimod (PSI-MS) name.
202+
- `tr`: retention time (only required for calibration)
203+
204+
For example:
205+
206+
```csv
207+
seq,modifications,tr
208+
AAGPSLSHTSGGTQSK,,12.1645
209+
AAINQKLIETGER,6|Acetyl,34.095
210+
AANDAGYFNDEMAPIEVKTK,12|Oxidation|18|Acetyl,37.3765
211+
```
212+
213+
See
214+
[examples/datasets](/projects/DeepLC/tree/master/examples/datasets)
215+
for more examples.
216+
217+
### Prediction models
218+
219+
DeepLC comes with multiple CNN models trained on data from various experimental
220+
settings. By default, DeepLC selects the best model based on the calibration dataset. If
221+
no calibration is performed, the first default model is selected. Always keep
222+
note of the used models and the DeepLC version. The current version comes with:
223+
224+
| Model filename | Experimental settings | Publication |
225+
| - | - | - |
226+
| full_hc_PXD005573_mcp_8c22d89667368f2f02ad996469ba157e.hdf5 | Reverse phase | [Bruderer et al. 2017](https://pubmed.ncbi.nlm.nih.gov/29070702/) |
227+
| full_hc_PXD005573_mcp_1fd8363d9af9dcad3be7553c39396960.hdf5 | Reverse phase | [Bruderer et al. 2017](https://pubmed.ncbi.nlm.nih.gov/29070702/) |
228+
| full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5 | Reverse phase | [Bruderer et al. 2017](https://pubmed.ncbi.nlm.nih.gov/29070702/) |
229+
230+
For all the full models that can be used in DeepLC (including some TMT models!) please see:
231+
232+
[https://github.com/RobbinBouwmeester/DeepLCModels](https://github.com/RobbinBouwmeester/DeepLCModels)
233+
234+
Naming convention for the models is as follows:
235+
236+
[full_hc]\_[dataset]\_[fixed_mods]\_[hash].hdf5
237+
238+
The different parts refer to:
239+
240+
**full_hc** - flag to indicated a finished, trained, and fully optimized model
241+
242+
**dataset** - name of the dataset used to fit the model (see the original publication, supplementary table 2)
243+
244+
**fixed mods** - flag to indicate fixed modifications were added to peptides without explicit indication (e.g., carbamidomethyl of cysteine)
245+
246+
**hash** - indicates different architectures, where "1fd8363d9af9dcad3be7553c39396960" indicates CNN filter lengths of 8, "cb975cfdd4105f97efa0b3afffe075cc" indicates CNN filter lengths of 4, and "8c22d89667368f2f02ad996469ba157e" indicates filter lengths of 2
247+
248+
249+
## Q&A
250+
251+
**__Q: Is it required to indicate fixed modifications in the input file?__**
252+
253+
Yes, even modifications like carbamidomethyl should be in the input file.
254+
255+
**__Q: So DeepLC is able to predict the retention time for any modification?__**
256+
257+
Yes, DeepLC can predict the retention time of any modification. However, if the
258+
modification is **very** different from the peptides the model has seen during
259+
training the accuracy might not be satisfactory for you. For example, if the model
260+
has never seen a phosphor atom before, the accuracy of the prediction is going to
261+
be low.
262+
263+
**__Q: Installation fails. Why?__**
264+
265+
Please make sure to install DeepLC in a path that does not contain spaces. Run
266+
the latest LTS version of Ubuntu or Windows 10. Make sure you have enough disk
267+
space available, surprisingly TensorFlow needs quite a bit of disk space. If
268+
you are still not able to install DeepLC, please feel free to contact us:
269+
270+
Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be
271+
272+
**__Q: I have a special usecase that is not supported. Can you help?__**
273+
274+
Ofcourse, please feel free to contact us:
275+
276+
Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be
277+
278+
**__Q: DeepLC runs out of memory. What can I do?__**
279+
280+
You can try to reduce the batch size. DeepLC should be able to run if the batch size is low
281+
enough, even on machines with only 4 GB of RAM.
282+
283+
**__Q: I have a graphics card, but DeepLC is not using the GPU. Why?__**
284+
285+
For now DeepLC defaults to the CPU instead of the GPU. Clearly, because you want
286+
to use the GPU, you are a power user :-). If you want to make the most of that expensive
287+
GPU, you need to change or remove the following line (at the top) in __deeplc.py__:
288+
289+
```
290+
# Set to force CPU calculations
291+
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
292+
```
293+
294+
Also change the same line in the function __reset_keras()__:
295+
296+
```
297+
# Set to force CPU calculations
298+
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
299+
```
300+
301+
Either remove the line or change to (where the number indicates the number of GPUs):
302+
303+
```
304+
# Set to force CPU calculations
305+
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
306+
```
307+
308+
**__Q: What modification name should I use?__**
309+
310+
The names from unimod are used. The PSI-MS name is used by default, but the Interim name
311+
is used as a fall-back if the PSI-MS name is not available. It should be fine as long as it is support by [proforma](https://pubs.acs.org/doi/10.1021/acs.jproteome.1c00771) and [psm_utils](/projects/psm_utils).
312+
313+
**__Q: I have a modification that is not in unimod. How can I add the modification?__**
314+
315+
Unfortunately since the V3.0 this is not possible any more via the GUI or commandline. You will need to use [psm_utils](/projects/psm_utils), above a minimal example is shown where we convert an identification file into a psm_list which is accepted by DeepLC. Here the sequence can for example include just the composition in proforma format (e.g., SEQUEN[Formula:C12H20O2]CE).
316+
317+
**__Q: Help, all my predictions are between [0,10]. Why?__**
318+
319+
It is likely you did not use calibration. No problem, but the retention times for training
320+
purposes were normalized between [0,10]. This means that you probably need to adjust the
321+
retention time yourselve after analysis or use a calibration set as the input.
322+
323+
324+
**__Q: What does the option `dict_divider` do?__**
325+
326+
This parameter defines the precision to use for fast-lookup of retention times
327+
for calibration. A value of 10 means a precision of 0.1 (and 100 a precision of
328+
0.01) between the calibration anchor points. This parameter does not influence
329+
the precision of the calibration, but setting it too high might mean that there
330+
is bad selection of the models between anchor points. A safe value is usually
331+
higher than 10.
332+
333+
334+
**__Q: What does the option `split_cal` do?__**
335+
336+
The option `split_cal`, or split calibration, sets number of divisions of the
337+
chromatogram for piecewise linear calibration. If the value is set to 10 the
338+
chromatogram is split up into 10 equidistant parts. For each part the median
339+
value of the calibration peptides is selected. These are the anchor points.
340+
Between each anchor point a linear fit is made. This option has no effect when
341+
the pyGAM generalized additive models are used for calibration.
342+
343+
344+
**__Q: How does the ensemble part of DeepLC work?__**
345+
346+
Models within the same directory are grouped if they overlap in their name. The overlap
347+
has to be in their full name, except for the last part of the name after a "_"-character.
348+
349+
The following models will be grouped:
350+
351+
```
352+
full_hc_dia_fixed_mods_a.hdf5
353+
full_hc_dia_fixed_mods_b.hdf5
354+
```
355+
356+
None of the following models will not be grouped:
357+
358+
```
359+
full_hc_dia_fixed_mods2_a.hdf5
360+
full_hc_dia_fixed_mods_b.hdf5
361+
full_hc_dia_fixed_mods_2_b.hdf5
362+
```
363+
364+
**__Q: I would like to take the ensemble average of multiple models, even if they are trained on different datasets. How can I do this?__**
365+
366+
Feel free to experiment! Models within the same directory are grouped if they overlap in
367+
their name. The overlap has to be in their full name, except for the last part of the
368+
name after a "_"-character.
369+
370+
The following models will be grouped:
371+
372+
```
373+
model_dataset1.hdf5
374+
model_dataset2.hdf5
375+
```
376+
377+
So you just need to rename your models.

0 commit comments

Comments
 (0)