Skip to content
This repository was archived by the owner on Jan 9, 2025. It is now read-only.

Commit 1ea6ed4

Browse files
authored
Implement Alternative Inputs (computational-metabolomics#15)
* Return metabolite ID with msn annotation results * Rewrite MSn method for limiting connectivity to fragment edges only * Add option for use of smiles without non-structural isomeric information * Update tests * Correct return type hints * Implement SQLITE3 annotate_msn results database * Add msn option to ResultsDb * Re-structure results db tables * Update add_ms to add ms information to the queries table * Add function to insert entries into the results and substructures tables * Add function to get structure frequencies and/or SMILEs * Update user-facing functions for compatibility with ResultsDb * Remove text-based output and return substructure smiles from build functions in addition to final structures * Update build unit tests for ResultsDb * Add CSV output for build functions * Check if ResultsDb output matches reference files * Check ResultsDb CSV files line by line vs reference * Implement simple bond dissociation energy calculations * Add integer MS integer IDs and implement calculate_frequencies to more efficiently calculate structure frequencies * Re-format get_bond_enthalpies * Use integer IDs for results DB * Add retain_substructures option * Make filter_hmdbid_substructures a filtered version of the hmdbid_substructures table * Implement the substructure network generation algorithm in SQLite instead of networkx * Add get_substructure_network function to convert SQLite3 substructure network to a networkx graph * Implement get_single_edge to get substructure edge weights without the generation of a substructure network * Add integer substructure key * Update unit tests * Add parse_ms_data function to convert user provided raw data into a list required for building * Implement msp parsing, update existing tests and spread functions across additional files * Add unit testing and documentation for parse.py * Amend connectivity database unit tests so that they fail in case of the generation of an empty DB * Amend results docstrings * Update docstrings of user-facing build functions * Only test isomorphism database on non-windows systems
1 parent 62063ee commit 1ea6ed4

13 files changed

+1718
-528
lines changed

metaboblend/algorithms.py

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
#
4+
# Copyright © 2019-2020 Ralf Weber
5+
#
6+
# This file is part of MetaboBlend.
7+
#
8+
# MetaboBlend is free software: you can redistribute it and/or modify
9+
# it under the terms of the GNU General Public License as published by
10+
# the Free Software Foundation, either version 3 of the License, or
11+
# (at your option) any later version.
12+
#
13+
# MetaboBlend is distributed in the hope that it will be useful,
14+
# but WITHOUT ANY WARRANTY; without even the implied warranty of
15+
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16+
# GNU General Public License for more details.
17+
#
18+
# You should have received a copy of the GNU General Public License
19+
# along with MetaboBlend. If not, see <https://www.gnu.org/licenses/>.
20+
#
21+
22+
import numpy
23+
24+
25+
def find_path(mass_list, sum_matrix, n, mass, max_subset_length, path=[]):
26+
"""
27+
Recursive solution for backtracking through the dynamic programming boolean matrix. All possible subsets are found
28+
29+
:param mass_list: A list of masses from which to identify subsets.
30+
31+
:param mass: The target mass of the sum of the substructures.
32+
33+
:param sum_matrix: The dynamic programming boolean matrix.
34+
35+
:param n: The size of mass_list.
36+
37+
:param max_subset_length: The maximum length of subsets to return. Allows the recursive backtracking algorithm to
38+
terminate early in many cases, significantly improving runtime.
39+
40+
:param path: List for keeping track of the current subset.
41+
42+
:return: Generates of lists containing the masses of valid subsets.
43+
"""
44+
45+
# base case - the path has generated a correct solution
46+
if mass == 0:
47+
yield sorted(path)
48+
return
49+
50+
# stop running when we overshoot the mass
51+
elif mass < 0:
52+
return
53+
54+
# can we sum up to the target value using the remaining masses? recursive call
55+
elif sum_matrix[n][mass]:
56+
yield from find_path(mass_list, sum_matrix, n - 1, mass, max_subset_length, path)
57+
58+
if len(path) < max_subset_length:
59+
path.append(mass_list[n-1])
60+
61+
yield from find_path(mass_list, sum_matrix, n - 1, mass - mass_list[n - 1], max_subset_length, path)
62+
path.pop()
63+
64+
65+
def subset_sum(mass_list, mass, max_subset_length=3):
66+
"""
67+
Dynamic programming implementation of subset sum. Note that, whilst this algorithm is pseudo-polynomial, the
68+
backtracking algorithm for obtaining all possible subsets has exponential complexity and so remains unsuitable
69+
for large input values. This does, however, tend to perform a lot better than non-sum_matrix implementations, as
70+
we're no longer doing sums multiple times and we've cut down the operations performed during the exponential portion
71+
of the method.
72+
73+
:param mass_list: A list of masses from which to identify subsets.
74+
75+
:param mass: The target mass of the sum of the substructures.
76+
77+
:param max_subset_length: The maximum length of subsets to return. Allows the recursive backtracking algorithm to
78+
terminate early in many cases, significantly improving runtime.
79+
80+
:return: Generates of lists containing the masses of valid subsets.
81+
"""
82+
83+
n = len(mass_list)
84+
85+
# initialise dynamic programming array
86+
sum_matrix = numpy.ndarray([n + 1, mass + 1], bool)
87+
88+
# subsets can always equal 0
89+
for i in range(n+1):
90+
sum_matrix[i][0] = True
91+
92+
# empty subsets do not have non-zero sums
93+
for i in range(mass):
94+
sum_matrix[0][i + 1] = False
95+
96+
# fill in the remaining boolean matrix
97+
for i in range(n):
98+
for j in range(mass+1):
99+
if j >= mass_list[i]:
100+
sum_matrix[i + 1][j] = sum_matrix[i][j] or sum_matrix[i][j - mass_list[i]]
101+
else:
102+
sum_matrix[i + 1][j] = sum_matrix[i][j]
103+
104+
# backtrack through the matrix recursively to obtain all solutions
105+
return find_path(mass_list, sum_matrix, n, mass, max_subset_length)

metaboblend/auxiliary.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@
2020
#
2121

2222
import itertools
23-
import networkx as nx
2423
import pylab as plt
24+
import networkx as nx
2525

2626

2727
def calculate_complete_multipartite_graphs(max_atoms_available, max_n_substructures):

0 commit comments

Comments
 (0)