Skip to content
This repository was archived by the owner on Jan 9, 2025. It is now read-only.

Commit 89cc549

Browse files
jackgisbyRJMW
andauthored
Feat rule based fragmenter (computational-metabolomics#17)
* Compatibility with conda version of geng; remove geng tool from package * Incorporate pkl files into connectivity database * Add nauty as dependency * Add pickle as test dependency * Switch from strings to pickles for connectivity graphs * Use blob instead of text to store pickled dictionary * No longer write substructures to .smi * Add option to build to select only frequent substructures * Add connectivity filter to k_configs * Incorporate connectivity filter into MSn build method * Build substructures for each set of masses independently * Call itertools.product on substructures within multiprocessing portion of build * Configure run script for current create_isomorphism_database inputs * Built subsets should be empty list, not None * Update variable names, remove debug options, update docstrings * Add annotate_msn and generate_structures user functions * Move stage at which multiprocessing step is performed * Allow for multiple output options in build * Remove ppm option for retrieving elemental composition from substructure db * Allow list of mc/exact_mass to be passed to generate_structures * Use TemporaryDirectory to store unittest results * Let generate_structures return/yield smiles * Implement build_msn to incorporate considerations for building structures from MS/MS * Implement annotate_msn to provide an interface to build_msn * Add/update build docstrings * Remove unnecessary build parameters * Pass data dictionary to user-facing build functions rather than separate mc, exact_mass, MSn masses * Update variable naming conventions * Add newline between smiles in out file * Update SubstructureDb for removal of .pkl files * Add function create_substructure_database * Bring tests up to date with variable renaming * Bring scripts up to date with variable renaming * Simplify loading of test data and remove teardown * Remove unused class ConnectivityDb and update SubstructureDb parameters * Implement additional non-msn build tests * Improve temporary table cleaning logic * Fix issues with new build functions * Allow tests to load auxiliary test data * Implement msn tests and update k_config test for new parameter * Correctly specify ppm in generate_structures * Minor docstring and code reformatting * Add binder dir * Add example notebook * Remove scripts * Implement basic notebook * Add small substructures to database prior to msn annotation * Complete notebook example * Fix logic for when smi_out_dir is None * Rename example_msms.ipynb to workflow.ipynb * Add pip to install metaboblend * Add data dir, remove databases dir, move test data to data dir * Write notebook databases to notebook_data * Unzip test data * Simplify test paths * Remove databases from gitignore * Use test databases for notebook * Implement simple hydrogenation rules * Get bond types rather than number of available atoms for hydrogen rule calculations * Don't count dummy atoms for bond type calculations * Remove dummy atom mass * Use max_degree of 6 and 2 available_atoms by default for create_substructure_database * Account for the fact we use neutral peaks (i.e. have removed adduct ion) * Modify hydrogen re-arrangement rules for doulbe bonds * Update databases tests * Implement test for calculate_possible_hydrogenations using reference numbers * Add test for calculate_hydrogen_rearrangements * Update hydrogen re-arrangement calculation function documentation * Update remaining unit tests * Add hydrogen re-arrangement compound HMDB XMLs * Record even substructures * Record even substructures in results DB * Add indexes to improve combine_ecs function performance * Improve results DB hierarchy and implement aggregation of scoring metrics * Define SQLite functions to calculate scores via queries alone * Record max BDE in spectra results table * Calculate frequency in the absence of scores (for non-MSn method) * Retain substructures does not cause substructures not to be initially recorded * Add additional scoring metrics * Update results db test data * Define ppm error and valence of fragment prior to re-ordering * Configure checks on recording of putative structure information * Calculate scores at substructure combination level * Convert True to 1 and False to 0 for conversion to SQLite boolean type * Index results DB * Use a loop in place of pool.map * Minor performance improvements * Merge minor performance improvements * Use the minimum absolute error for getting possible fragment ions * Add separate absolute error options for MSn peak and full structure * Use 0.005 for abs_error_precursor * Drop indexes before inserting into results DB * Add results table index on ms_id_num and structure_smiles * Update results DB tests * Add table for generating unique structure smiles IDs * Calculate cosine spectrum similarity * Allow for the specification of weights for the results database scoring calculations * Aggregate structure scores but force floating point division * Select fragment and substructure id when calculating results scores for the correlated query * Update results DB tests with updated scores * Don't create indexes until structure scoring * Don't include valence=0 substructures in the substructure database * Add max BDE parameter for building * Remove redundant connectivity graphs * Update data to test filter records function * Update dictionary pickle with Python 3.7 * Update file header * Update contact information * Update setup.py * Update tests for RDKit changes * Update README * Keep functioning buttons * Update testing workflow * Use python 3.7 * Remove unused dependencies * Use only the channel conda-forge * Add pillow and pyqt dependencies * Remove list definition in function arguments * Add algorithms test * Merge database tests into single file * Restructure modules * Restructure tests * Update outdated imports * Omit notebooks from coverage Co-authored-by: Ralf Weber <[email protected]>
1 parent 88fd297 commit 89cc549

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+6653
-3271
lines changed

.coveragerc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
[run]
2-
omit = tests/*,setup.py,metaboblend/__main__.py
2+
omit = tests/*,setup.py,metaboblend/__main__.py,notebooks/

.github/workflows/build-test.yml

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ jobs:
88

99
strategy:
1010
matrix:
11-
os: [ubuntu-latest, windows-latest, macos-latest]
12-
python-version: [3.6, 3.7, 3.8]
11+
os: [ ubuntu-latest, windows-latest, macos-latest ]
12+
python-version: [ 3.7, 3.8, 3.9 ]
1313

1414
env:
1515
OS: ${{ matrix.os }}
@@ -19,21 +19,29 @@ jobs:
1919
- uses: actions/checkout@v2
2020

2121
- name: Setup conda - Python ${{ matrix.python-version }}
22-
uses: s-weigand/setup-conda@v1
22+
uses: conda-incubator/setup-miniconda@v2
2323
with:
24-
update-conda: true
24+
auto-update-conda: true
25+
activate-environment: metaboblend
2526
python-version: ${{ matrix.python-version }}
26-
conda-channels: anaconda, conda-forge
27+
environment-file: environment.yml
28+
channels: anaconda, conda-forge
2729

28-
- name: Install dependencies
30+
- name: Build MetaboBlend
31+
shell: bash -l {0}
2932
run: |
33+
python setup.py install
34+
metaboblend --help
3035
31-
python --version
32-
conda env update --file environment.yml --name base
36+
- name: Test with pytest-cov
37+
shell: bash -l {0}
38+
run: |
39+
conda install pytest codecov pytest-cov -c conda-forge
40+
pytest --cov ./ --cov-config=.coveragerc --cov-report=xml
3341
3442
- name: Lint with flake8
43+
shell: bash -l {0}
3544
run: |
36-
3745
conda install flake8
3846
3947
# stop build if there are Python syntax errors or undefined names
@@ -42,15 +50,6 @@ jobs:
4250
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
4351
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
4452
45-
- name: Test with pytest-cov
46-
run: |
47-
48-
python setup.py install
49-
metaboblend --help
50-
51-
conda install pytest codecov pytest-cov -c conda-forge
52-
pytest --cov ./ --cov-config=.coveragerc --cov-report=xml
53-
5453
- name: Upload code coverage to codecov
5554
uses: codecov/codecov-action@v1
5655
with:

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@ target/
7070

7171
# Jupyter Notebook
7272
.ipynb_checkpoints
73+
notebooks/notebook_data
74+
notebooks/notebook_data/*
7375

7476
# pyenv
7577
.python-version
@@ -105,4 +107,4 @@ ENV/
105107
# ignore test files
106108
*/libgcc_s_dw2-1.dll
107109
*/libstdc++-6.dll
108-
tests/test*
110+
tests/tmp*

README.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
MetaboBlend
22
===========
3-
|Version| |Py versions| |Git| |Bioconda| |Build Status| |License| |RTD doc| |codecov| |binder|
3+
..
4+
|Version| |Py versions| |Bioconda| |RTD doc| |License| |binder|
5+
6+
|Git| |Build Status| |codecov|
47

58
Python package for *de novo* structural elucidation of small molecules in mass spectrometry-based Metabolomics
69

@@ -32,12 +35,11 @@ will help you to make the PR if you are new to `git`.
3235
Developers & Contributors
3336
-------------------------
3437
- Ralf J. M. Weber ([email protected]) - `University of Birmingham (UK) <https://www.birmingham.ac.uk/staff/profiles/biosciences/weber-ralf.aspx>`_
35-
- Jack Gisby ([email protected]) - `University of Birmingham (UK) <http://www.birmingham.ac.uk/index.aspx>`_
36-
38+
- Jack Gisby ([email protected]) - `University of Birmingham (UK) <http://www.birmingham.ac.uk/index.aspx>`_, `Imperial College London (UK) <https://www.imperial.ac.uk/>`_
3739

3840
Licenses
3941
--------
40-
MetaboBlend is licensed under the GNU General Public License v3.0 (see `LICENSE file <https://github.com/computational-metabolomics/metaboblend/blob/master/LICENSE>`_ for licensing information). Copyright © 2019 - 2020 Ralf Weber
42+
MetaboBlend is licensed under the GNU General Public License v3.0 (see `LICENSE file <https://github.com/computational-metabolomics/metaboblend/blob/master/LICENSE>`_ for licensing information). Copyright © 2019 - 2020 Jack Gisby, Ralf Weber
4143

4244

4345
.. |Build Status| image:: https://github.com/computational-metabolomics/metaboblend/workflows/metaboblend/badge.svg

binder/environment.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
name: metaboblend
2+
channels:
3+
- conda-forge
4+
- bioconda
5+
dependencies:
6+
- python=3.7
7+
- numpy
8+
- scipy
9+
- pandas
10+
- networkx
11+
- rdkit
12+
- biopython
13+
- matplotlib
14+
- nauty
15+
- pip
16+
- pip:
17+
- -e ../

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
# -- Project information -----------------------------------------------------
2121

2222
project = 'MetaboBlend'
23-
copyright = '2020, Ralf Weber'
23+
copyright = '2020, Jack Gisby, Ralf Weber'
2424
author = 'Jack Gisby, Ralf Weber'
2525

2626
# -- General configuration ---------------------------------------------------

docs/source/license.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
License
22
-------
33
TODO: change package name
4-
*MetaboBlend* is licensed under the GNU General Public License v3.0 (see `LICENSE file <https://github.com/computational-metabolomics/metaboblend/blob/master/LICENSE>`_ for licensing information). Copyright © 2019 - 2020 Ralf Weber
4+
*MetaboBlend* is licensed under the GNU General Public License v3.0 (see `LICENSE file <https://github.com/computational-metabolomics/metaboblend/blob/master/LICENSE>`_ for licensing information). Copyright © 2019 - 2020 Jack Gisby, Ralf Weber

environment.yml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
11
name: metaboblend
22
channels:
33
- conda-forge
4-
- bioconda
54
dependencies:
6-
- python>=3.6
5+
- python>=3.7
6+
- pillow!=9.2.0
7+
- pyqt
8+
- matplotlib
79
- numpy
8-
- scipy
9-
- pandas
1010
- networkx
1111
- rdkit
12-
- biopython
13-
- matplotlib
1412
- nauty

metaboblend/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/usr/bin/env python
22
# -*- coding: utf-8 -*-
33
#
4-
# Copyright © 2019-2020 Ralf Weber
4+
# Copyright © 2019-2020 Jack Gisby, Ralf Weber
55
#
66
# This file is part of MetaboBlend.
77
#
@@ -19,7 +19,7 @@
1919
# along with MetaboBlend. If not, see <https://www.gnu.org/licenses/>.
2020
#
2121

22-
__author__ = 'Ralf Weber ([email protected])'
23-
__credits__ = 'Ralf Weber ([email protected])'
22+
__authors__ = ['Ralf Weber ([email protected])', 'Jack Gisby ([email protected])']
23+
__credits__ = ['Ralf Weber ([email protected])', 'Jack Gisby ([email protected])']
2424
__version__ = '0.1.0'
2525
__license__ = 'GPLv3'

metaboblend/__main__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/usr/bin/env python
22
# -*- coding: utf-8 -*-
33
#
4-
# Copyright © 2019-2020 Ralf Weber
4+
# Copyright © 2019-2020 Jack Gisby, Ralf Weber
55
#
66
# This file is part of MetaboBlend.
77
#

0 commit comments

Comments
 (0)