Skip to content

Commit 7b8c789

Browse files
committed
updated README for new architecture
1 parent 8ead07b commit 7b8c789

File tree

1 file changed

+52
-24
lines changed

1 file changed

+52
-24
lines changed

README.md

Lines changed: 52 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,47 @@
11
# MSDpostprocess: A machine learning based quality filter for lipid identifications from MS-DIAL
2+
There are three ways to install and run MSDpostprocess: as an executable, as a docker container, and as a python package.
23

3-
## Set up a local conda environment
4-
MSDPostprocess consists of a pair of python scripts, one for training and one for inference. The dependencies for these scripts are laid out in environments/MSDpostprocess.yml.
5-
To set up the conda environment for these scripts:
6-
1. Install conda, miniconda, or mamba. Instructions can be found here https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
7-
2. Download the MSDpostprocess repository and navigate to the `environments` directory.
8-
3. Run `conda env create -p msdp_env --file MSDpostprocess.yml`
9-
4. Run `conda activate ./msdp_env`
4+
## The executable version
5+
This method reqires no installation but it is somewhat slower than the other options.
6+
1. Download the executable for your operating system, the trained model, and the example options file from the releases page.
7+
2. Run `MSDpostprocess.exe --print options.toml` to get a default options file.
8+
3. Edit the options file for your experiment.
9+
4. Run `MSDpostprocess.exe --options options.toml`
1010

11-
## Example analysis
12-
1. Navigate to the `example_data` directory
13-
2. Run `python ../scripts/MSDpostprocess-training.py --input training_data.tsv --min_rt 7 --out_dir ./`
14-
3. Run `python ../scripts/MSDpostprocess-inference.py --input example_output_negative.txt,example_output_positive.txt --min_rt 7 --model model.dill --plots --out_dir ./`
11+
## The conda/virtualenv version
12+
To set up the conda environment for the tool:
13+
1. Download the MSDpostprocess repository and navigate to the `environments` directory.
14+
2. Run `conda env create -p msdp_env --file MSDpostprocess.yml`
15+
3. Run `conda activate ./msdp_env`
16+
4. Navigate to the repository root.
17+
5. Run `pip install -e .`
18+
19+
Otherwise, if you are using virtualenv:
20+
1. Navigate to the repository root.
21+
2. Run `virtualenv environments/msdp_env`
22+
3. Activate the environment
23+
4. Run `pip install -e .`
24+
25+
To use the tool with either method:
26+
1. Download the trained models from the releases page.
27+
2. Run `python -m MSDpostprocess --print options.toml` to get a default options file.
28+
3. Edit the options file for your experiment.
29+
4. Run `python -m MSDpostprocess --options options.toml`
1530

16-
Running either script with the `--help` flag will show a list of available options.
31+
## The Docker version
32+
The docker container has trained models provided under /models/. To use these get the default options.toml from step 1 below:
33+
1. Run `docker run --rm -v /path/to/your/data/:/data/ stavisvols/msdpostprocess python -m MSDpostprocess --print /data/options.toml`
34+
2. Edit the options file for your experiment.
35+
3. Run `docker run --rm -v /path/to/your/data/:/data/ stavisvols/msdpostprocess python -m MSDpostprocess --options /data/options.toml`
36+
The working directory will be within the docker container's filesystem.
37+
38+
## Example analysis
39+
1. Install the tool using one of the above methods.
40+
2. Download `QE_Pro_model.zip` and `example_analysis.zip` from the releases page.
41+
3. Extract both archives. There should be no folders nested under `QE_Pro_model/` and `example_analysis/`.
42+
4. Run `MSDpostprocess.exe --options example_analysis/example_analysis_options.toml`
1743

18-
On some Windows environments the warning `No module named 'brainpy._c.composition'` will be displayed. This is not an error and does not impact the running of the script.
44+
On some systems the warning `No module named 'brainpy._c.composition'` will be displayed. This is not an error and does not impact the running of the tool.
1945

2046
## MS-Dial export settings for inference
2147
1. Click "Export" along the top bar
@@ -27,19 +53,21 @@ On some Windows environments the warning `No module named 'brainpy._c.compositio
2753
7. "Export format" should be "msp"
2854
8. Click Export
2955

30-
A .txt will now be generated in the chosen directory with the information required for the MSDpostprocess script. The file name will start with "Mz"
56+
A .txt will now be generated in the chosen directory with the information required for MSDpostprocess. The file name will start with "Mz"
3157

3258
## Prepare training data
33-
1. Start with MS-DIAL exports using the same settings as described above for running the inference script.
34-
2. Delete the metadata rows so that the column headers are now the first row of the document.
35-
3. Add a column named `label` which contains 0 for incorrect IDs and 1 for correct IDs. It is critical that this column be before (to the left of) the `MS/MS spectrum` column as all subsequent columns (those to the right) are assumed to be m/z data.
36-
4. Remove all entries that you do not manually label.
37-
5. Save in a tab-delimited format.
59+
1. Start with MS-DIAL exports using the same settings as described above for inference.
60+
2. Add a column named `label` which contains 0 for incorrect IDs, 1 for correct IDs, and is otherwise left blank. It is critical that this column be before (to the left of) the `MS/MS spectrum` column as all subsequent columns (those to the right) are assumed to be m/z data.
61+
3. Save in a tab-delimited format.
3862

39-
The training script is capable of being trained on multiple input files. The retention time correction is run on a per-input-file basis so all of the entries in each file should have been run with the same chromatography. Multiple experiments can be used to generate training data, but it is suggested that they are input as separate files for chromatography alignment purposes.
63+
The tool is capable of being trained on multiple input files. The retention time correction is run on a per-input-file basis. Multiple experiments can be used to generate training data, but it is suggested that they are input as separate files for chromatography alignment purposes.
4064

41-
## Docker version
42-
A docker image, along with the pre-trained model is available through dockerhub.
65+
## The datasets used for training.
66+
| Instrument | Source | N | Model | Organism |
67+
| :--------- | :----- | :- | :---- | :------- |
68+
| Q-Exactive | [MTBLS5583](https://www.ebi.ac.uk/metabolights/editor/MTBLS5583/descriptors) | 742 | QE_Pro_model | *Canis familiaris* |
69+
| LTQ Velos Pro | in-house | 1076 | QE_Pro_model | *Aspergillus fumigatus* |
70+
| LTQ Velos Pro | in-house | 545 | QE_Pro_model | *Laccaria bicolor* |
71+
| TripleTOF 6600 | [MTBLS4108](https://www.ebi.ac.uk/metabolights/editor/MTBLS4108/descriptors) | 1125 | TOF_model | *Rattus rattus* |
4372

44-
For inference run `docker run --rm -v /path/to/your/data/:/data/ stavisvols/msdpostprocess MSDpostprocess-inference.py --input /data/yourdata1.txt,/data/yourdata2.txt --out_dir /data/`
45-
The inference script will use the provided `model.dill` by default. If you wish to use a different model file it can be specified using the `--model` flag.
73+
Our tests have shown that a model will likely generalize to a family of instruments but that this has limits. We expect that the QE_Pro_model will work for all orbitrap systems. We do not have the data necessary to know how well the TOF model will generalize to all TOF instruments so if you are working with e.g. TimsTOF data it would be a good idea to do an initial validation of the output. The publicly available datasets used were reprocessed from raw files and annotated in-house.

0 commit comments

Comments
 (0)