|
| 1 | +--- |
| 2 | +title: "deeplc" |
| 3 | +project: "deeplc" |
| 4 | +github_project: "https://github.com/compomics/DeepLC" |
| 5 | +description: "DeepLC: Retention time prediction for (modified) peptides using Deep Learning." |
| 6 | +layout: default |
| 7 | +tags: project_home, deeplc |
| 8 | +permalink: /projects/deeplc |
| 9 | +--- |
| 10 | + |
| 11 | +<img src="https://github.com/compomics/DeepLC/raw/master/img/deeplc_logo.png" |
| 12 | +width="150" height="150" /> <br/><br/> |
| 13 | + |
| 14 | +[](https://github.com/compomics/DeepLC/releases/latest/) |
| 15 | +[](https://pypi.org/project/deeplc/) |
| 16 | +[](https://bioconda.github.io/recipes/deeplc/README.html) |
| 17 | +[](/projects/deeplc/actions) |
| 18 | +[](https://www.apache.org/licenses/LICENSE-2.0) |
| 19 | +[](https://twitter.com/compomics) |
| 20 | + |
| 21 | +DeepLC: Retention time prediction for (modified) peptides using Deep Learning. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +- [Introduction](#introduction) |
| 26 | +- [Citation](#citation) |
| 27 | +- [Usage](#usage) |
| 28 | + - [Web application](#web-application) |
| 29 | + - [Graphical user interface](#graphical-user-interface) |
| 30 | + - [Python package](#python-package) |
| 31 | + - [Installation](#installation) |
| 32 | + - [Command line interface](#command-line-interface) |
| 33 | + - [Python module](#python-module) |
| 34 | + - [Input files](#input-files) |
| 35 | + - [Prediction models](#prediction-models) |
| 36 | +- [Q&A](#qa) |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## Introduction |
| 41 | + |
| 42 | +DeepLC is a retention time predictor for (modified) peptides that employs Deep |
| 43 | +Learning. Its strength lies in the fact that it can accurately predict |
| 44 | +retention times for modified peptides, even if hasn't seen said modification |
| 45 | +during training. |
| 46 | + |
| 47 | +DeepLC can be used through the |
| 48 | +[web application](https://iomics.ugent.be/deeplc/), |
| 49 | +locally with a graphical user interface (GUI), or as a Python package. In the |
| 50 | +latter case, DeepLC can be used from the command line, or as a Python module. |
| 51 | + |
| 52 | +## Citation |
| 53 | + |
| 54 | +If you use DeepLC for your research, please use the following citation: |
| 55 | +>**DeepLC can predict retention times for peptides that carry as-yet unseen modifications** |
| 56 | +>Robbin Bouwmeester, Ralf Gabriels, Niels Hulstaert, Lennart Martens & Sven Degroeve |
| 57 | +> Nature Methods 18, 1363–1369 (2021) [doi: 10.1038/s41592-021-01301-5](http://dx.doi.org/10.1038/s41592-021-01301-5) |
| 58 | +
|
| 59 | +## Usage |
| 60 | + |
| 61 | +### Web application |
| 62 | +[](https://iomics.ugent.be/deeplc/) |
| 63 | + |
| 64 | +Just go to [iomics.ugent.be/deeplc](https://iomics.ugent.be/deeplc/) and get started! |
| 65 | + |
| 66 | + |
| 67 | +### Graphical user interface |
| 68 | + |
| 69 | +#### In an existing Python environment (cross-platform) |
| 70 | + |
| 71 | +1. In your terminal with Python (>=3.7) installed, run `pip install deeplc[gui]` |
| 72 | +2. Start the GUI with the command `deeplc-gui` or `python -m deeplc.gui` |
| 73 | + |
| 74 | +#### Standalone installer (Windows) |
| 75 | + |
| 76 | +[](https://github.com/compomics/DeepLC/releases/latest/) |
| 77 | + |
| 78 | + |
| 79 | +1. Download the DeepLC installer (`DeepLC-...-Windows-64bit.exe`) from the |
| 80 | +[latest release](https://github.com/compomics/DeepLC/releases/latest/) |
| 81 | +2. Execute the installer |
| 82 | +3. If Windows Smartscreen shows a popup window with "Windows protected your PC", |
| 83 | +click on "More info" and then on "Run anyway". You will have to trust us that |
| 84 | +DeepLC does not contain any viruses, or you can check the source code 😉 |
| 85 | +4. Go through the installation steps |
| 86 | +5. Start DeepLC! |
| 87 | + |
| 88 | + |
| 89 | + |
| 90 | + |
| 91 | +### Python package |
| 92 | + |
| 93 | +#### Installation |
| 94 | + |
| 95 | +[](http://bioconda.github.io/recipes/deeplc/README.html) |
| 96 | +[](http://bioconda.github.io/recipes/deeplc/README.html) |
| 97 | +[](https://quay.io/repository/biocontainers/deeplc) |
| 98 | + |
| 99 | +Install with conda, using the bioconda and conda-forge channels: |
| 100 | +`conda install -c bioconda -c conda-forge deeplc` |
| 101 | + |
| 102 | +Or install with pip: |
| 103 | +`pip install deeplc` |
| 104 | + |
| 105 | +#### Command line interface |
| 106 | + |
| 107 | +To use the DeepLC CLI, run: |
| 108 | + |
| 109 | +```sh |
| 110 | +deeplc --file_pred <path/to/peptide_file.csv> |
| 111 | +``` |
| 112 | + |
| 113 | +We highly recommend to add a peptide file with known retention times for |
| 114 | +calibration: |
| 115 | + |
| 116 | +```sh |
| 117 | +deeplc --file_pred <path/to/peptide_file.csv> --file_cal <path/to/peptide_file_with_tr.csv> |
| 118 | +``` |
| 119 | + |
| 120 | +For an overview of all CLI arguments, run `deeplc --help`. |
| 121 | + |
| 122 | +#### Python module |
| 123 | + |
| 124 | +Minimal example: |
| 125 | + |
| 126 | +```python |
| 127 | +import pandas as pd |
| 128 | +from deeplc import DeepLC |
| 129 | + |
| 130 | +peptide_file = "datasets/test_pred.csv" |
| 131 | +calibration_file = "datasets/test_train.csv" |
| 132 | + |
| 133 | +pep_df = pd.read_csv(peptide_file, sep=",") |
| 134 | +pep_df['modifications'] = pep_df['modifications'].fillna("") |
| 135 | + |
| 136 | +cal_df = pd.read_csv(calibration_file, sep=",") |
| 137 | +cal_df['modifications'] = cal_df['modifications'].fillna("") |
| 138 | + |
| 139 | +dlc = DeepLC() |
| 140 | +dlc.calibrate_preds(seq_df=cal_df) |
| 141 | +preds = dlc.make_preds(seq_df=pep_df) |
| 142 | +``` |
| 143 | + |
| 144 | +Minimal example with psm_utils: |
| 145 | + |
| 146 | +```python |
| 147 | +import pandas as pd |
| 148 | + |
| 149 | +from psm_utils.psm import PSM |
| 150 | +from psm_utils.psm_list import PSMList |
| 151 | +from psm_utils.io import write_file |
| 152 | + |
| 153 | +from deeplc import DeepLC |
| 154 | + |
| 155 | +infile = pd.read_csv("https://github.com/compomics/DeepLC/files/13298024/231108_DeepLC_input-peptides.csv") |
| 156 | +psm_list = [] |
| 157 | + |
| 158 | +for idx,row in infile.iterrows(): |
| 159 | + seq = row["modifications"].replace("(","[").replace(")","]") |
| 160 | + |
| 161 | + if seq.startswith("["): |
| 162 | + idx_nterm = seq.index("]") |
| 163 | + seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:] |
| 164 | + |
| 165 | + psm_list.append(PSM(peptidoform=seq,spectrum_id=idx)) |
| 166 | + |
| 167 | +psm_list = PSMList(psm_list=psm_list) |
| 168 | + |
| 169 | +infile = pd.read_csv("https://github.com/compomics/DeepLC/files/13298022/231108_DeepLC_input-calibration-file.csv") |
| 170 | +psm_list_calib = [] |
| 171 | + |
| 172 | +for idx,row in infile.iterrows(): |
| 173 | + seq = row["seq"].replace("(","[").replace(")","]") |
| 174 | + |
| 175 | + if seq.startswith("["): |
| 176 | + idx_nterm = seq.index("]") |
| 177 | + seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:] |
| 178 | + |
| 179 | + psm_list_calib.append(PSM(peptidoform=seq,retention_time=row["tr"],spectrum_id=idx)) |
| 180 | + |
| 181 | +psm_list_calib = PSMList(psm_list=psm_list_calib) |
| 182 | + |
| 183 | +dlc = DeepLC() |
| 184 | +dlc.calibrate_preds(psm_list_calib) |
| 185 | +preds = dlc.make_preds(seq_df=psm_list) |
| 186 | +``` |
| 187 | + |
| 188 | +For a more elaborate example, see |
| 189 | +[examples/deeplc_example.py](https://github.com/compomics/DeepLC/blob/master/examples/deeplc_example.py) |
| 190 | +. |
| 191 | + |
| 192 | +### Input files |
| 193 | + |
| 194 | +DeepLC expects comma-separated values (CSV) with the following columns: |
| 195 | + |
| 196 | +- `seq`: unmodified peptide sequences |
| 197 | +- `modifications`: MS2PIP-style formatted modifications: Every modification is |
| 198 | + listed as `location|name`, separated by a pipe (`|`) between the location, the |
| 199 | + name, and other modifications. `location` is an integer counted starting at 1 |
| 200 | + for the first AA. 0 is reserved for N-terminal modifications, -1 for |
| 201 | + C-terminal modifications. `name` has to correspond to a Unimod (PSI-MS) name. |
| 202 | +- `tr`: retention time (only required for calibration) |
| 203 | + |
| 204 | +For example: |
| 205 | + |
| 206 | +```csv |
| 207 | +seq,modifications,tr |
| 208 | +AAGPSLSHTSGGTQSK,,12.1645 |
| 209 | +AAINQKLIETGER,6|Acetyl,34.095 |
| 210 | +AANDAGYFNDEMAPIEVKTK,12|Oxidation|18|Acetyl,37.3765 |
| 211 | +``` |
| 212 | + |
| 213 | +See |
| 214 | +[examples/datasets](/projects/DeepLC/tree/master/examples/datasets) |
| 215 | +for more examples. |
| 216 | + |
| 217 | +### Prediction models |
| 218 | + |
| 219 | +DeepLC comes with multiple CNN models trained on data from various experimental |
| 220 | +settings. By default, DeepLC selects the best model based on the calibration dataset. If |
| 221 | +no calibration is performed, the first default model is selected. Always keep |
| 222 | +note of the used models and the DeepLC version. The current version comes with: |
| 223 | + |
| 224 | +| Model filename | Experimental settings | Publication | |
| 225 | +| - | - | - | |
| 226 | +| full_hc_PXD005573_mcp_8c22d89667368f2f02ad996469ba157e.hdf5 | Reverse phase | [Bruderer et al. 2017](https://pubmed.ncbi.nlm.nih.gov/29070702/) | |
| 227 | +| full_hc_PXD005573_mcp_1fd8363d9af9dcad3be7553c39396960.hdf5 | Reverse phase | [Bruderer et al. 2017](https://pubmed.ncbi.nlm.nih.gov/29070702/) | |
| 228 | +| full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5 | Reverse phase | [Bruderer et al. 2017](https://pubmed.ncbi.nlm.nih.gov/29070702/) | |
| 229 | + |
| 230 | +For all the full models that can be used in DeepLC (including some TMT models!) please see: |
| 231 | + |
| 232 | +[https://github.com/RobbinBouwmeester/DeepLCModels](https://github.com/RobbinBouwmeester/DeepLCModels) |
| 233 | + |
| 234 | +Naming convention for the models is as follows: |
| 235 | + |
| 236 | +[full_hc]\_[dataset]\_[fixed_mods]\_[hash].hdf5 |
| 237 | + |
| 238 | +The different parts refer to: |
| 239 | + |
| 240 | +**full_hc** - flag to indicated a finished, trained, and fully optimized model |
| 241 | + |
| 242 | +**dataset** - name of the dataset used to fit the model (see the original publication, supplementary table 2) |
| 243 | + |
| 244 | +**fixed mods** - flag to indicate fixed modifications were added to peptides without explicit indication (e.g., carbamidomethyl of cysteine) |
| 245 | + |
| 246 | +**hash** - indicates different architectures, where "1fd8363d9af9dcad3be7553c39396960" indicates CNN filter lengths of 8, "cb975cfdd4105f97efa0b3afffe075cc" indicates CNN filter lengths of 4, and "8c22d89667368f2f02ad996469ba157e" indicates filter lengths of 2 |
| 247 | + |
| 248 | + |
| 249 | +## Q&A |
| 250 | + |
| 251 | +**__Q: Is it required to indicate fixed modifications in the input file?__** |
| 252 | + |
| 253 | +Yes, even modifications like carbamidomethyl should be in the input file. |
| 254 | + |
| 255 | +**__Q: So DeepLC is able to predict the retention time for any modification?__** |
| 256 | + |
| 257 | +Yes, DeepLC can predict the retention time of any modification. However, if the |
| 258 | +modification is **very** different from the peptides the model has seen during |
| 259 | +training the accuracy might not be satisfactory for you. For example, if the model |
| 260 | +has never seen a phosphor atom before, the accuracy of the prediction is going to |
| 261 | +be low. |
| 262 | + |
| 263 | +**__Q: Installation fails. Why?__** |
| 264 | + |
| 265 | +Please make sure to install DeepLC in a path that does not contain spaces. Run |
| 266 | +the latest LTS version of Ubuntu or Windows 10. Make sure you have enough disk |
| 267 | +space available, surprisingly TensorFlow needs quite a bit of disk space. If |
| 268 | +you are still not able to install DeepLC, please feel free to contact us: |
| 269 | + |
| 270 | +Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be |
| 271 | + |
| 272 | +**__Q: I have a special usecase that is not supported. Can you help?__** |
| 273 | + |
| 274 | +Ofcourse, please feel free to contact us: |
| 275 | + |
| 276 | +Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be |
| 277 | + |
| 278 | +**__Q: DeepLC runs out of memory. What can I do?__** |
| 279 | + |
| 280 | +You can try to reduce the batch size. DeepLC should be able to run if the batch size is low |
| 281 | +enough, even on machines with only 4 GB of RAM. |
| 282 | + |
| 283 | +**__Q: I have a graphics card, but DeepLC is not using the GPU. Why?__** |
| 284 | + |
| 285 | +For now DeepLC defaults to the CPU instead of the GPU. Clearly, because you want |
| 286 | +to use the GPU, you are a power user :-). If you want to make the most of that expensive |
| 287 | +GPU, you need to change or remove the following line (at the top) in __deeplc.py__: |
| 288 | + |
| 289 | +``` |
| 290 | +# Set to force CPU calculations |
| 291 | +os.environ['CUDA_VISIBLE_DEVICES'] = '-1' |
| 292 | +``` |
| 293 | + |
| 294 | +Also change the same line in the function __reset_keras()__: |
| 295 | + |
| 296 | +``` |
| 297 | +# Set to force CPU calculations |
| 298 | +os.environ['CUDA_VISIBLE_DEVICES'] = '-1' |
| 299 | +``` |
| 300 | + |
| 301 | +Either remove the line or change to (where the number indicates the number of GPUs): |
| 302 | + |
| 303 | +``` |
| 304 | +# Set to force CPU calculations |
| 305 | +os.environ['CUDA_VISIBLE_DEVICES'] = '1' |
| 306 | +``` |
| 307 | + |
| 308 | +**__Q: What modification name should I use?__** |
| 309 | + |
| 310 | +The names from unimod are used. The PSI-MS name is used by default, but the Interim name |
| 311 | +is used as a fall-back if the PSI-MS name is not available. It should be fine as long as it is support by [proforma](https://pubs.acs.org/doi/10.1021/acs.jproteome.1c00771) and [psm_utils](/projects/psm_utils). |
| 312 | + |
| 313 | +**__Q: I have a modification that is not in unimod. How can I add the modification?__** |
| 314 | + |
| 315 | +Unfortunately since the V3.0 this is not possible any more via the GUI or commandline. You will need to use [psm_utils](/projects/psm_utils), above a minimal example is shown where we convert an identification file into a psm_list which is accepted by DeepLC. Here the sequence can for example include just the composition in proforma format (e.g., SEQUEN[Formula:C12H20O2]CE). |
| 316 | + |
| 317 | +**__Q: Help, all my predictions are between [0,10]. Why?__** |
| 318 | + |
| 319 | +It is likely you did not use calibration. No problem, but the retention times for training |
| 320 | +purposes were normalized between [0,10]. This means that you probably need to adjust the |
| 321 | +retention time yourselve after analysis or use a calibration set as the input. |
| 322 | + |
| 323 | + |
| 324 | +**__Q: What does the option `dict_divider` do?__** |
| 325 | + |
| 326 | +This parameter defines the precision to use for fast-lookup of retention times |
| 327 | +for calibration. A value of 10 means a precision of 0.1 (and 100 a precision of |
| 328 | +0.01) between the calibration anchor points. This parameter does not influence |
| 329 | +the precision of the calibration, but setting it too high might mean that there |
| 330 | +is bad selection of the models between anchor points. A safe value is usually |
| 331 | +higher than 10. |
| 332 | + |
| 333 | + |
| 334 | +**__Q: What does the option `split_cal` do?__** |
| 335 | + |
| 336 | +The option `split_cal`, or split calibration, sets number of divisions of the |
| 337 | +chromatogram for piecewise linear calibration. If the value is set to 10 the |
| 338 | +chromatogram is split up into 10 equidistant parts. For each part the median |
| 339 | +value of the calibration peptides is selected. These are the anchor points. |
| 340 | +Between each anchor point a linear fit is made. This option has no effect when |
| 341 | +the pyGAM generalized additive models are used for calibration. |
| 342 | + |
| 343 | + |
| 344 | +**__Q: How does the ensemble part of DeepLC work?__** |
| 345 | + |
| 346 | +Models within the same directory are grouped if they overlap in their name. The overlap |
| 347 | +has to be in their full name, except for the last part of the name after a "_"-character. |
| 348 | + |
| 349 | +The following models will be grouped: |
| 350 | + |
| 351 | +``` |
| 352 | +full_hc_dia_fixed_mods_a.hdf5 |
| 353 | +full_hc_dia_fixed_mods_b.hdf5 |
| 354 | +``` |
| 355 | + |
| 356 | +None of the following models will not be grouped: |
| 357 | + |
| 358 | +``` |
| 359 | +full_hc_dia_fixed_mods2_a.hdf5 |
| 360 | +full_hc_dia_fixed_mods_b.hdf5 |
| 361 | +full_hc_dia_fixed_mods_2_b.hdf5 |
| 362 | +``` |
| 363 | + |
| 364 | +**__Q: I would like to take the ensemble average of multiple models, even if they are trained on different datasets. How can I do this?__** |
| 365 | + |
| 366 | +Feel free to experiment! Models within the same directory are grouped if they overlap in |
| 367 | +their name. The overlap has to be in their full name, except for the last part of the |
| 368 | +name after a "_"-character. |
| 369 | + |
| 370 | +The following models will be grouped: |
| 371 | + |
| 372 | +``` |
| 373 | +model_dataset1.hdf5 |
| 374 | +model_dataset2.hdf5 |
| 375 | +``` |
| 376 | + |
| 377 | +So you just need to rename your models. |
0 commit comments