Skip to content
This repository was archived by the owner on Jan 9, 2025. It is now read-only.

Commit 44bf05c

Browse files
authored
Implement MSn Annotation (computational-metabolomics#12)
* Compatibility with conda version of geng; remove geng tool from package * Incorporate pkl files into connectivity database * Add nauty as dependency * Add pickle as test dependency * Switch from strings to pickles for connectivity graphs * Use blob instead of text to store pickled dictionary * No longer write substructures to .smi * Add option to build to select only frequent substructures * Add connectivity filter to k_configs * Incorporate connectivity filter into MSn build method * Build substructures for each set of masses independently * Call itertools.product on substructures within multiprocessing portion of build * Configure run script for current create_isomorphism_database inputs * Built subsets should be empty list, not None * Update variable names, remove debug options, update docstrings * Add annotate_msn and generate_structures user functions * Move stage at which multiprocessing step is performed * Allow for multiple output options in build * Remove ppm option for retrieving elemental composition from substructure db * Allow list of mc/exact_mass to be passed to generate_structures * Use TemporaryDirectory to store unittest results * Let generate_structures return/yield smiles * Implement build_msn to incorporate considerations for building structures from MS/MS * Implement annotate_msn to provide an interface to build_msn * Add/update build docstrings * Remove unnecessary build parameters * Pass data dictionary to user-facing build functions rather than separate mc, exact_mass, MSn masses * Update variable naming conventions * Add newline between smiles in out file * Update SubstructureDb for removal of .pkl files * Add function create_substructure_database * Bring tests up to date with variable renaming * Bring scripts up to date with variable renaming * Simplify loading of test data and remove teardown * Remove unused class ConnectivityDb and update SubstructureDb parameters * Implement additional non-msn build tests * Improve temporary table cleaning logic * Fix issues with new build functions * Allow tests to load auxiliary test data * Implement msn tests and update k_config test for new parameter * Correctly specify ppm in generate_structures * Minor docstring and code reformatting * Add small substructures to database prior to msn annotation * Add type hinting to user-facing functions * Improve and re-order docstring params * Fix gen_subs_table SQLite syntax * Update user-facing docs * Update test and package data * Update variable names and docs * Allow specification of heavy atoms using min and max instead of a sequence * Update gen_subs_table SQL statement to use max/min not sequence * Add automated small substructure generation
1 parent 6a1856d commit 44bf05c

File tree

543 files changed

+15243
-266562
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

543 files changed

+15243
-266562
lines changed

.gitignore

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -102,14 +102,6 @@ ENV/
102102

103103
*~
104104

105-
# DB Files
106-
*.pkl
107-
*.sqlite
108-
109-
# ignore changes to scripts for testing
110-
scripts/*
111-
scripts/results/*
112-
113105
# ignore lib files for testing
114106
*/libgcc_s_dw2-1.dll
115107
*/libstdc++-6.dll

environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ dependencies:
1111
- rdkit
1212
- biopython
1313
- matplotlib
14+
- nauty

metaboblend/auxiliary.py

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,13 @@
2727
def calculate_complete_multipartite_graphs(max_atoms_available, max_n_substructures):
2828
"""
2929
Calculates all possible configurations of multipartite graphs up to (inclusive) a given number of atoms available
30-
("sizes") and number of substructures ("boxes"). The possible bonding configurations for these graphs are
30+
and number of substructures. The possible bonding configurations for these graphs are
3131
calculated by geng and RI, before they are used to combine substructures and generate novel metabolites.
3232
3333
:param max_n_substructures: The maximal number of substructures (vertices). At least two substructures must be
3434
available for bonding for a graph to be created.
3535
36-
:param max_atoms_available: The maximal number of atoms available (maximal number of edges per vertice) in each
36+
:param max_atoms_available: The maximal number of atoms available (maximal number of edges per vertex) in each
3737
substructure for bonding. At least one atom must be available for bonding for a graph to be created.
3838
3939
:return: A string detailing each possible combination of atoms_available and n_substructures (p) and the
@@ -94,31 +94,31 @@ def draw_subgraph(edges, vn):
9494

9595
plt.title(str(vn))
9696

97-
sG = nx.Graph()
98-
sG.add_edges_from([(e[0], e[1]) for e in edges])
97+
s_g = nx.Graph()
98+
s_g.add_edges_from([(e[0], e[1]) for e in edges])
9999

100-
pos = nx.circular_layout(sG)
101-
nx.draw(sG, pos)
100+
pos = nx.circular_layout(s_g)
101+
nx.draw(s_g, pos)
102102

103103
cols = ["b", "r", "g", "y"]
104-
cD = {}
104+
c_d = {}
105105

106106
i = 0
107107
for j, substructure in enumerate(vn):
108108
if len(substructure) == 1:
109-
cD[(i,)] = cols[j]
109+
c_d[(i,)] = cols[j]
110110
i += 1
111111

112112
elif len(substructure) == 2:
113-
cD[(i, i + 1)] = cols[j]
113+
c_d[(i, i + 1)] = cols[j]
114114
i += 2
115115

116-
for k in cD.keys():
117-
nx.draw_networkx_nodes(sG, pos=pos, nodelist=k, node_color=cD[k], node_size=800, alpha=1.0)
116+
for k in c_d.keys():
117+
nx.draw_networkx_nodes(s_g, pos=pos, nodelist=k, node_color=c_d[k], node_size=800, alpha=1.0)
118118

119-
nx.draw_networkx_labels(sG, pos=pos)
119+
nx.draw_networkx_labels(s_g, pos=pos)
120120

121-
return plt, sG
121+
return plt, s_g
122122

123123

124124
def graph_to_ri(graph: nx.Graph, name):
@@ -149,14 +149,14 @@ def graph_to_ri(graph: nx.Graph, name):
149149
return out
150150

151151

152-
def graph_info(p, sG, mappings):
152+
def graph_info(p, s_g, mappings):
153153
"""
154154
Generates and sorts valence and edge information.
155155
156156
:param p: String containing a connectivity subgraph configuration generated by
157157
:py:meth:`metaboblend.auxiliary.calculate_complete_multipartite_graphs`.
158158
159-
:param sG: A :py:meth:`networkx.Graph` connectivity subgraph generated by geng based on the output of
159+
:param s_g: A :py:meth:`networkx.Graph` connectivity subgraph generated by geng based on the output of
160160
:py:meth:`metaboblend.auxiliary.calculate_complete_multipartite_graphs`.
161161
162162
:param mappings: Mappings calculated by RI for the relabelling of a subgraph generated by geng. Used by get_valences
@@ -169,7 +169,7 @@ def graph_info(p, sG, mappings):
169169
frags = {}
170170

171171
for m in mappings:
172-
ug = nx.relabel_nodes(sG, m, copy=True)
172+
ug = nx.relabel_nodes(s_g, m, copy=True)
173173
vn = get_degrees(p, ug)
174174

175175
e = list(ug.edges())

0 commit comments

Comments
 (0)