Skip to content

Commit 83b6038

Browse files
authored
Merge can now take n inputs #147 (#195)
* Merge can take n inputs #147 * tox compliant * typo * test for merging msdfs * refactoring * added cli test for merge * minor change * typo * sphinx docs updated * related to #198 * linted * updated defaults * added a TODO * fixed tests * new implementation of handling linkml objects * cleaned up imports * minor edit to shut mypy up * removed print * test edit * passes locally * corrected key columns for the merge join * flake8-ed * substitute hard code by variables * default confidence assignment bug fix * dealing with negation had a bug. * linted * replace hard-code to variable * clean-up * variables instead of harcoded column names * if inputs have no confidence so does the output
1 parent 4a7ec04 commit 83b6038

18 files changed

+250
-260
lines changed

docs/_modules/sssom/util.html

Lines changed: 49 additions & 57 deletions
Large diffs are not rendered by default.

docs/_modules/tests/test_cli.html

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ <h1>Source code for tests.test_cli</h1><div class="highlight"><pre>
8383
<span class="n">crosstab</span><span class="p">,</span>
8484
<span class="n">dedupe</span><span class="p">,</span>
8585
<span class="n">diff</span><span class="p">,</span>
86+
<span class="n">merge</span><span class="p">,</span>
8687
<span class="n">parse</span><span class="p">,</span>
8788
<span class="n">partition</span><span class="p">,</span>
8889
<span class="n">ptable</span><span class="p">,</span>
@@ -130,6 +131,7 @@ <h1>Source code for tests.test_cli</h1><div class="highlight"><pre>
130131
<span class="n">test_cases</span> <span class="o">=</span> <span class="n">get_multiple_input_test_cases</span><span class="p">()</span>
131132
<span class="bp">self</span><span class="o">.</span><span class="n">run_diff</span><span class="p">(</span><span class="n">runner</span><span class="p">,</span> <span class="n">test_cases</span><span class="p">)</span>
132133
<span class="bp">self</span><span class="o">.</span><span class="n">run_partition</span><span class="p">(</span><span class="n">runner</span><span class="p">,</span> <span class="n">test_cases</span><span class="p">)</span>
134+
<span class="bp">self</span><span class="o">.</span><span class="n">run_merge</span><span class="p">(</span><span class="n">runner</span><span class="p">,</span> <span class="n">test_cases</span><span class="p">)</span>
133135

134136
<span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">test_cases</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="mi">2</span><span class="p">)</span></div>
135137

@@ -279,6 +281,22 @@ <h1>Source code for tests.test_cli</h1><div class="highlight"><pre>
279281
<span class="p">]</span>
280282
<span class="n">result</span> <span class="o">=</span> <span class="n">runner</span><span class="o">.</span><span class="n">invoke</span><span class="p">(</span><span class="n">correlations</span><span class="p">,</span> <span class="n">params</span><span class="p">)</span>
281283
<span class="bp">self</span><span class="o">.</span><span class="n">run_successful</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">test_case</span><span class="p">)</span>
284+
<span class="k">return</span> <span class="n">result</span></div>
285+
286+
<div class="viewcode-block" id="SSSOMCLITestSuite.run_merge"><a class="viewcode-back" href="../../tests.html#tests.test_cli.SSSOMCLITestSuite.run_merge">[docs]</a> <span class="k">def</span> <span class="nf">run_merge</span><span class="p">(</span>
287+
<span class="bp">self</span><span class="p">,</span> <span class="n">runner</span><span class="p">:</span> <span class="n">CliRunner</span><span class="p">,</span> <span class="n">test_cases</span><span class="p">:</span> <span class="n">Mapping</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">SSSOMTestCase</span><span class="p">]</span>
288+
<span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Result</span><span class="p">:</span>
289+
<span class="sd">&quot;&quot;&quot;Run the merge test.&quot;&quot;&quot;</span>
290+
<span class="n">params</span> <span class="o">=</span> <span class="p">[]</span>
291+
<span class="n">out_file</span> <span class="o">=</span> <span class="kc">None</span>
292+
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">test_cases</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
293+
<span class="n">params</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">filepath</span><span class="p">)</span>
294+
<span class="n">out_file</span> <span class="o">=</span> <span class="n">t</span>
295+
<span class="k">if</span> <span class="n">out_file</span><span class="p">:</span>
296+
<span class="n">params</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="s2">&quot;--output&quot;</span><span class="p">,</span> <span class="n">out_file</span><span class="o">.</span><span class="n">get_out_file</span><span class="p">(</span><span class="s2">&quot;tsv&quot;</span><span class="p">)])</span>
297+
298+
<span class="n">result</span> <span class="o">=</span> <span class="n">runner</span><span class="o">.</span><span class="n">invoke</span><span class="p">(</span><span class="n">merge</span><span class="p">,</span> <span class="n">params</span><span class="p">)</span>
299+
<span class="bp">self</span><span class="o">.</span><span class="n">run_successful</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">test_cases</span><span class="p">)</span>
282300
<span class="k">return</span> <span class="n">result</span></div></div>
283301
</pre></div>
284302

docs/_modules/tests/test_reconcile.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ <h1>Source code for tests.test_reconcile</h1><div class="highlight"><pre>
101101
<span class="n">msdf1</span> <span class="o">=</span> <span class="n">read_sssom_table</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s2">/basic.tsv&quot;</span><span class="p">)</span>
102102
<span class="n">msdf2</span> <span class="o">=</span> <span class="n">read_sssom_table</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s2">/basic2.tsv&quot;</span><span class="p">)</span>
103103

104-
<span class="n">merged_msdf</span> <span class="o">=</span> <span class="n">merge_msdf</span><span class="p">(</span><span class="n">msdf1</span><span class="o">=</span><span class="n">msdf1</span><span class="p">,</span> <span class="n">msdf2</span><span class="o">=</span><span class="n">msdf2</span><span class="p">)</span>
104+
<span class="n">merged_msdf</span> <span class="o">=</span> <span class="n">merge_msdf</span><span class="p">(</span><span class="n">msdf1</span><span class="p">,</span> <span class="n">msdf2</span><span class="p">)</span>
105105

106106
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="mi">123</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">merged_msdf</span><span class="o">.</span><span class="n">df</span><span class="p">))</span></div>
107107

@@ -110,7 +110,7 @@ <h1>Source code for tests.test_reconcile</h1><div class="highlight"><pre>
110110
<span class="n">msdf1</span> <span class="o">=</span> <span class="n">read_sssom_table</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s2">/basic4.tsv&quot;</span><span class="p">)</span>
111111
<span class="n">msdf2</span> <span class="o">=</span> <span class="n">read_sssom_table</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s2">/basic5.tsv&quot;</span><span class="p">)</span>
112112

113-
<span class="n">merged_msdf</span> <span class="o">=</span> <span class="n">merge_msdf</span><span class="p">(</span><span class="n">msdf1</span><span class="o">=</span><span class="n">msdf1</span><span class="p">,</span> <span class="n">msdf2</span><span class="o">=</span><span class="n">msdf2</span><span class="p">,</span> <span class="n">reconcile</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
113+
<span class="n">merged_msdf</span> <span class="o">=</span> <span class="n">merge_msdf</span><span class="p">(</span><span class="n">msdf1</span><span class="p">,</span> <span class="n">msdf2</span><span class="p">,</span> <span class="n">reconcile</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
114114

115115
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="mi">53</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">msdf1</span><span class="o">.</span><span class="n">df</span><span class="p">))</span>
116116
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="mi">53</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">msdf2</span><span class="o">.</span><span class="n">df</span><span class="p">))</span>

docs/_static/pygments.css

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,22 @@ td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5
55
span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
66
.highlight .hll { background-color: #ffffcc }
77
.highlight { background: #f8f8f8; }
8-
.highlight .c { color: #408080; font-style: italic } /* Comment */
8+
.highlight .c { color: #3D7B7B; font-style: italic } /* Comment */
99
.highlight .err { border: 1px solid #FF0000 } /* Error */
1010
.highlight .k { color: #008000; font-weight: bold } /* Keyword */
1111
.highlight .o { color: #666666 } /* Operator */
12-
.highlight .ch { color: #408080; font-style: italic } /* Comment.Hashbang */
13-
.highlight .cm { color: #408080; font-style: italic } /* Comment.Multiline */
14-
.highlight .cp { color: #BC7A00 } /* Comment.Preproc */
15-
.highlight .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */
16-
.highlight .c1 { color: #408080; font-style: italic } /* Comment.Single */
17-
.highlight .cs { color: #408080; font-style: italic } /* Comment.Special */
12+
.highlight .ch { color: #3D7B7B; font-style: italic } /* Comment.Hashbang */
13+
.highlight .cm { color: #3D7B7B; font-style: italic } /* Comment.Multiline */
14+
.highlight .cp { color: #9C6500 } /* Comment.Preproc */
15+
.highlight .cpf { color: #3D7B7B; font-style: italic } /* Comment.PreprocFile */
16+
.highlight .c1 { color: #3D7B7B; font-style: italic } /* Comment.Single */
17+
.highlight .cs { color: #3D7B7B; font-style: italic } /* Comment.Special */
1818
.highlight .gd { color: #A00000 } /* Generic.Deleted */
1919
.highlight .ge { font-style: italic } /* Generic.Emph */
20-
.highlight .gr { color: #FF0000 } /* Generic.Error */
20+
.highlight .gr { color: #E40000 } /* Generic.Error */
2121
.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
22-
.highlight .gi { color: #00A000 } /* Generic.Inserted */
23-
.highlight .go { color: #888888 } /* Generic.Output */
22+
.highlight .gi { color: #008400 } /* Generic.Inserted */
23+
.highlight .go { color: #717171 } /* Generic.Output */
2424
.highlight .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
2525
.highlight .gs { font-weight: bold } /* Generic.Strong */
2626
.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
@@ -33,15 +33,15 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
3333
.highlight .kt { color: #B00040 } /* Keyword.Type */
3434
.highlight .m { color: #666666 } /* Literal.Number */
3535
.highlight .s { color: #BA2121 } /* Literal.String */
36-
.highlight .na { color: #7D9029 } /* Name.Attribute */
36+
.highlight .na { color: #687822 } /* Name.Attribute */
3737
.highlight .nb { color: #008000 } /* Name.Builtin */
3838
.highlight .nc { color: #0000FF; font-weight: bold } /* Name.Class */
3939
.highlight .no { color: #880000 } /* Name.Constant */
4040
.highlight .nd { color: #AA22FF } /* Name.Decorator */
41-
.highlight .ni { color: #999999; font-weight: bold } /* Name.Entity */
42-
.highlight .ne { color: #D2413A; font-weight: bold } /* Name.Exception */
41+
.highlight .ni { color: #717171; font-weight: bold } /* Name.Entity */
42+
.highlight .ne { color: #CB3F38; font-weight: bold } /* Name.Exception */
4343
.highlight .nf { color: #0000FF } /* Name.Function */
44-
.highlight .nl { color: #A0A000 } /* Name.Label */
44+
.highlight .nl { color: #767600 } /* Name.Label */
4545
.highlight .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
4646
.highlight .nt { color: #008000; font-weight: bold } /* Name.Tag */
4747
.highlight .nv { color: #19177C } /* Name.Variable */
@@ -58,11 +58,11 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
5858
.highlight .dl { color: #BA2121 } /* Literal.String.Delimiter */
5959
.highlight .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
6060
.highlight .s2 { color: #BA2121 } /* Literal.String.Double */
61-
.highlight .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
61+
.highlight .se { color: #AA5D1F; font-weight: bold } /* Literal.String.Escape */
6262
.highlight .sh { color: #BA2121 } /* Literal.String.Heredoc */
63-
.highlight .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
63+
.highlight .si { color: #A45A77; font-weight: bold } /* Literal.String.Interpol */
6464
.highlight .sx { color: #008000 } /* Literal.String.Other */
65-
.highlight .sr { color: #BB6688 } /* Literal.String.Regex */
65+
.highlight .sr { color: #A45A77 } /* Literal.String.Regex */
6666
.highlight .s1 { color: #BA2121 } /* Literal.String.Single */
6767
.highlight .ss { color: #19177C } /* Literal.String.Symbol */
6868
.highlight .bp { color: #008000 } /* Name.Builtin.Pseudo */

docs/cli_usage.html

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ <h3>convert<a class="headerlink" href="#sssom-py-convert" title="Permalink to th
155155
<p>currently only supports conversion to RDF)</p>
156156
</div>
157157
<dl class="simple">
158-
<dt>Example:</dt><dd><p>sssom covert –input my.sssom.tsv –output-format rdfxml –output my.sssom.owl</p>
158+
<dt>Example:</dt><dd><p>sssom convert my.sssom.tsv –output-format rdfxml –output my.sssom.owl</p>
159159
</dd>
160160
</dl>
161161
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>sssom-py convert <span class="o">[</span>OPTIONS<span class="o">]</span> INPUT
@@ -328,13 +328,11 @@ <h3>dosql<a class="headerlink" href="#sssom-py-dosql" title="Permalink to this h
328328
</section>
329329
<section id="sssom-py-merge">
330330
<h3>merge<a class="headerlink" href="#sssom-py-merge" title="Permalink to this headline"></a></h3>
331-
<p>Merge msdf2 into msdf1.</p>
332-
<dl class="simple">
333-
<dt>if reconcile=True, then dedupe(remove redundant lower confidence mappings) and</dt><dd><p>reconcile (if msdf contains a higher confidence _negative_ mapping,
331+
<p>Merge multiple MappingSetDataFrames into one .</p>
332+
<p>if reconcile=True, then dedupe(remove redundant lower confidence mappings) and
333+
reconcile (if msdf contains a higher confidence _negative_ mapping,
334334
then remove lower confidence positive one. If confidence is the same,
335335
prefer HumanCurated. If both HumanCurated, prefer negative mapping).</p>
336-
</dd>
337-
</dl>
338336
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>sssom-py merge <span class="o">[</span>OPTIONS<span class="o">]</span> <span class="o">[</span>INPUTS<span class="o">]</span>...
339337
</pre></div>
340338
</div>

0 commit comments

Comments
 (0)