You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# MSDpostprocess: A machine learning based quality filter for lipid identifications from MS-DIAL
2
+
There are three ways to install and run MSDpostprocess: as an executable, as a docker container, and as a python package.
2
3
3
-
## Set up a local conda environment
4
-
MSDPostprocess consists of a pair of python scripts, one for training and one for inference. The dependencies for these scripts are laid out in environments/MSDpostprocess.yml.
5
-
To set up the conda environment for these scripts:
6
-
1. Install conda, miniconda, or mamba. Instructions can be found here https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
7
-
2. Download the MSDpostprocess repository and navigate to the `environments` directory.
8
-
3. Run `conda env create -p msdp_env --file MSDpostprocess.yml`
9
-
4. Run `conda activate ./msdp_env`
4
+
## The executable version
5
+
This method reqires no installation but it is somewhat slower than the other options.
6
+
1. Download the executable for your operating system, the trained model, and the example options file from the releases page.
7
+
2. Run `MSDpostprocess.exe --print options.toml` to get a default options file.
8
+
3. Edit the options file for your experiment.
9
+
4. Run `MSDpostprocess.exe --options options.toml`
10
10
11
-
## Example analysis
12
-
1. Navigate to the `example_data` directory
13
-
2. Run `python ../scripts/MSDpostprocess-training.py --input training_data.tsv --min_rt 7 --out_dir ./`
1. Download the MSDpostprocess repository and navigate to the `environments` directory.
14
+
2. Run `conda env create -p msdp_env --file MSDpostprocess.yml`
15
+
3. Run `conda activate ./msdp_env`
16
+
4. Navigate to the repository root.
17
+
5. Run `pip install -e .`
18
+
19
+
Otherwise, if you are using virtualenv:
20
+
1. Navigate to the repository root.
21
+
2. Run `virtualenv environments/msdp_env`
22
+
3. Activate the environment
23
+
4. Run `pip install -e .`
24
+
25
+
To use the tool with either method:
26
+
1. Download the trained models from the releases page.
27
+
2. Run `python -m MSDpostprocess --print options.toml` to get a default options file.
28
+
3. Edit the options file for your experiment.
29
+
4. Run `python -m MSDpostprocess --options options.toml`
15
30
16
-
Running either script with the `--help` flag will show a list of available options.
31
+
## The Docker version
32
+
The docker container has trained models provided under /models/. To use these get the default options.toml from step 1 below:
33
+
1. Run `docker run --rm -v /path/to/your/data/:/data/ stavisvols/msdpostprocess python -m MSDpostprocess --print /data/options.toml`
34
+
2. Edit the options file for your experiment.
35
+
3. Run `docker run --rm -v /path/to/your/data/:/data/ stavisvols/msdpostprocess python -m MSDpostprocess --options /data/options.toml`
36
+
The working directory will be within the docker container's filesystem.
37
+
38
+
## Example analysis
39
+
1. Install the tool using one of the above methods.
40
+
2. Download `QE_Pro_model.zip` and `example_analysis.zip` from the releases page.
41
+
3. Extract both archives. There should be no folders nested under `QE_Pro_model/` and `example_analysis/`.
42
+
4. Run `MSDpostprocess.exe --options example_analysis/example_analysis_options.toml`
17
43
18
-
On some Windows environments the warning `No module named 'brainpy._c.composition'` will be displayed. This is not an error and does not impact the running of the script.
44
+
On some systems the warning `No module named 'brainpy._c.composition'` will be displayed. This is not an error and does not impact the running of the tool.
19
45
20
46
## MS-Dial export settings for inference
21
47
1. Click "Export" along the top bar
@@ -27,19 +53,21 @@ On some Windows environments the warning `No module named 'brainpy._c.compositio
27
53
7. "Export format" should be "msp"
28
54
8. Click Export
29
55
30
-
A .txt will now be generated in the chosen directory with the information required for the MSDpostprocess script. The file name will start with "Mz"
56
+
A .txt will now be generated in the chosen directory with the information required for MSDpostprocess. The file name will start with "Mz"
31
57
32
58
## Prepare training data
33
-
1. Start with MS-DIAL exports using the same settings as described above for running the inference script.
34
-
2. Delete the metadata rows so that the column headers are now the first row of the document.
35
-
3. Add a column named `label` which contains 0 for incorrect IDs and 1 for correct IDs. It is critical that this column be before (to the left of) the `MS/MS spectrum` column as all subsequent columns (those to the right) are assumed to be m/z data.
36
-
4. Remove all entries that you do not manually label.
37
-
5. Save in a tab-delimited format.
59
+
1. Start with MS-DIAL exports using the same settings as described above for inference.
60
+
2. Add a column named `label` which contains 0 for incorrect IDs, 1 for correct IDs, and is otherwise left blank. It is critical that this column be before (to the left of) the `MS/MS spectrum` column as all subsequent columns (those to the right) are assumed to be m/z data.
61
+
3. Save in a tab-delimited format.
38
62
39
-
The training script is capable of being trained on multiple input files. The retention time correction is run on a per-input-file basis so all of the entries in each file should have been run with the same chromatography. Multiple experiments can be used to generate training data, but it is suggested that they are input as separate files for chromatography alignment purposes.
63
+
The tool is capable of being trained on multiple input files. The retention time correction is run on a per-input-file basis. Multiple experiments can be used to generate training data, but it is suggested that they are input as separate files for chromatography alignment purposes.
40
64
41
-
## Docker version
42
-
A docker image, along with the pre-trained model is available through dockerhub.
For inference run `docker run --rm -v /path/to/your/data/:/data/ stavisvols/msdpostprocess MSDpostprocess-inference.py --input /data/yourdata1.txt,/data/yourdata2.txt --out_dir /data/`
45
-
The inference script will use the provided `model.dill` by default. If you wish to use a different model file it can be specified using the `--model` flag.
73
+
Our tests have shown that a model will likely generalize to a family of instruments but that this has limits. We expect that the QE_Pro_model will work for all orbitrap systems. We do not have the data necessary to know how well the TOF model will generalize to all TOF instruments so if you are working with e.g. TimsTOF data it would be a good idea to do an initial validation of the output. The publicly available datasets used were reprocessed from raw files and annotated in-house.
0 commit comments