Skip to content

Commit ec8b03d

Browse files
committed
v3.3.2
1 parent 063378c commit ec8b03d

File tree

16 files changed

+263
-206
lines changed

16 files changed

+263
-206
lines changed

doc/Release.html

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,22 @@
66
<a id=top>
77
<!--#include virtual="./ssi/start1.html" -->
88

9+
<h4>v3.3.2 15-Sept-2021</h4>
10+
Fixes for bugs recently introduced bugs and one display improvement.
11+
12+
<p><ttp>runSingleTCW</ttp>
13+
<ul>
14+
<li>Various features use "Rank=1", which was not being updated when pruning was applied.
15+
<li>"Remove Annotation" was not clearing the GOseq values totally.
16+
</ul>
17+
18+
<ttp>viewSingleTCW</ttp>
19+
<ul>
20+
<li>AnnoDB Hits bug fix from v3.3.0: Filtering on "%HitCov" stopped working.
21+
<li>Pair Alignment: if multiple pairs are shown, the alignment will start in the same place across all alignments.
22+
<li>Verified all "Reproduce" information from "Overview" and improved the description.
23+
</ul>
24+
925
<h4>v3.3.1 1-Sept-2021</h4>
1026

1127
The singleTCW database has a small schema update that will be applied the first time an existing
@@ -15,7 +31,7 @@ <h4>v3.3.1 1-Sept-2021</h4>
1531
<ul>
1632
<li>sTCWdb version sdb6.0: The percent similarity (identity) is stored as a real number instead of an integer.
1733

18-
<li><font color=green>NEW</font> Prune Hits - there are many hits with the exact same coordinates and/or descriptions.
34+
<li><font color=blue>NEW</font> Prune Hits - there are many hits with the exact same coordinates and/or descriptions.
1935
A new function removes all but the best based on same alignment values or same description.
2036
This function can be set to run in the AnnoDBs <ttl>Options</ttl> or from the command line.
2137
<li>Tiny changes;

doc/tour/viewSingle/reproduce.html

Lines changed: 40 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,10 @@
44
<title>singleTCW Overview</title>
55

66
<style>
7-
.nw {vertical-align: top; white-space: nowrap;}
8-
.top {vertical-align: top; }
9-
table.y, .y td, th { border: 1px solid black; border-collapse: collapse; border-spacing: 5px;}
10-
th, td {padding: 5px;}
7+
.tabley {border: 1px solid black; border-spacing: 0px; border-collapse: collapse;}
8+
.tabley td {border: 1px solid black; padding: 3px; }
119
</style>
10+
1211
</head>
1312

1413
<body style="font-family: Arial, sans-serif; font-size: 14px;">
@@ -17,45 +16,47 @@
1716
<tr><td>&nbsp;
1817

1918
<tr><td>
20-
<h2>Reproduce singleTCW overview</h2>
19+
<h2>Reproduce sTCW overview</h2>
20+
2121
This describes how to obtain the table of results corresponding to statistics
2222
in the overview. The following short-hand is used:
2323
<ul>
2424
<li>The "Column:x" indicates that x should be selected for viewing in the table.
2525
<li>#Seqs is the number of sequences, which is listed at the top of the overview.
2626
<li>"Stats" is the "Show Column Stats" on the "Table..." drop-down.
2727
</ul>
28-
28+
<big><i>Always clear filters before setting new ones!</i></big>
2929
<h3>INPUT</h3>
3030
Most of the input section is data supplied by the user with runSingleTCW. The following
31-
two are computed:
32-
<table class=y>
33-
<tr><td>Counts: SIZE<td>Show all<td>Column:Counts for all conditions: Stats, column:Sum
31+
are computed:
32+
<table class=tabley>
33+
<tr><td colspan=3>Counts:
34+
<tr><td>SIZE<td>Show All<td>Column:Counts for all conditions: Stats, column:Sum
35+
<tr><td colspan=3>Sequences:
3436
<tr><td>AVG-len<td>Show All<td>Column:Length; Stats, column:Average
3537
<tr><td>MED-len<td>Show All<td>Column:Length; Stats, column:Median
36-
<br>The median in the two cases will be slightly different
38+
<br>The median in the two cases may be slightly different
3739
because they are computed differently.
3840
</table>
3941

4042
<h3>ANNOTATION</h3>
4143
<h3>Hit Statistics:</h3>
4244

43-
<table class=y>
45+
<table class=tabley>
4446
<tr><th>Column<th>Search<th>Obtain number
4547
<tr><td>Sequences with hits<td>Filters: Annotated<td>Number of rows
46-
<tr><td>Unique hits<td>AnnoDB Hits: Seq:None(slow)<sup>*</sup><td>Hits # above table
47-
<tr><td>Total sequence hits<td>AnnoDB Hits: Seq:None(slow)<sup>*</sup><td>Pairs # above table
48+
<tr><td>Unique hits<td>AnnoDB Hits: Seq:None(slow)<td>Hits # above table
49+
<tr><td>Total sequence hits<td>AnnoDB Hits: Seq:None(slow)<td>Pairs # above table
4850
<tr><td>Bases covered by hit<td>AnnoDB Hits: Seq:Best Bits<td>Unselect "Group by Hit ID""; column:Align;
4951
<br>Stats, column:Sum; for NT, multiply by 3
5052
<tr><td>Total bases
5153
<br>(residues for AA seqs)
5254
<td>Show All<td>Column:Length; Stats, column:Sum
5355
</table>
54-
<sup>*</sup> Use Clear All
5556

5657
<h3>AnnoDBs:</h3>
5758

58-
<table class=y>
59+
<table class=tabley>
5960
<tr><th>Column<th>Search<th>Obtain number
6061
<tr><td>ONLY
6162
<td>Filters: Annotated, Best Bits,
@@ -71,31 +72,33 @@ <h3>AnnoDBs:</h3>
7172
<tr><td>UNIQUE<td>Seq:None(slow)<td>Hits # above table
7273
<tr><td>TOTAL<td>Seq:None(slow) <td>Pairs # above table
7374

74-
<tr><td>AVG-SIM<td>Seq:None(slow)<td>Unselect "Group by Hit ID""; column:%Sim; Stats, column:Average
75+
<tr><td>AVG %SIM<td>Seq:None(slow)<td>Unselect "Group by Hit ID"; column:%Sim; Stats, column:Average
7576

7677
<tr><td colspan=3>Rank=1 is the best hit for a sequence for a given annoDB.
77-
<tr><td>HAS SEQ<td>Seq:Rank=1<td>Seqs # above table; percentage of total #Seqs
78-
<tr><td>AVG-SIM<td>Seq:Rank=1 <td>Uncheck "Group by Hit""; Column:%Sim; Stats, column:Average
79-
<tr><td>Cover &gt;=N<td>Seq:Rank=1,%Sim&gt;=N,%HitCov&gt;=N
80-
<td>Seqs # above table; percentage of HIT-SEQ
78+
<tr><td>HAS HIT<td>Seq:Rank=1<td>Seqs # above table; percentage of total #Seqs
79+
<tr><td>AVG %SIM<td>Seq:Rank=1 <td>Uncheck "Group by Hit"; Column:%Sim; Stats, column:Average
80+
<tr><td>Cover &gt;=N<td>Seq:Rank=1,%Sim&gt;=N,%HitCov<sup>*</sup>&gt;=N
81+
<td>Seqs # above table; percentage of HAS HIT
8182
</table>
82-
HitCov is the difference between the hit stop and start coordinates divided by the length of the protein.
83+
<sup>*</sup>HitCov is the difference between the hit stop and start coordinates divided by the length of the protein.
8384

8485
<h3>Top 15 species from total: N</h3>
8586
The N is the number of unique species based on the first two words of the
86-
species name:
87+
species name. From "AnnoDB Hits":
8788
<ul>
88-
<li>AnnoDB Hits, select Species, select Two words
89-
<li>The number listed beside "Species" is the same as N, and
90-
the species are listed in the table.
89+
<li>Select "Species"", select "Two words"", enter first two words of species name next to "Find", select "Find", select the entry on the
90+
left and add to the right.
91+
<li>Select "Best Bits", "Best Anno" or "None" for the three numbers shown.
92+
<li>BUILD TABLE
93+
<li>Use the number listed beside "Pairs".
9194
</ul>
9295

9396
<h3>Gene Ontology Statistics:</h3>
9497

95-
<table class=y>
98+
<table class=tabley>
9699
<tr><th>Column<th>Search<th>Obtain number
97100
<tr><td>Unique GOs
98-
<td>GO Annotation: no filters<sup>*</sup>
101+
<td>GO Annotation: no filters
99102
<td>Results number
100103
<tr><td>Unique hits with GOs
101104
<td>AnnoDB Hits: Seqs:None; GO,etc:Has GO
@@ -117,19 +120,18 @@ <h3>Gene Ontology Statistics:</h3>
117120
<tr><td>cellular_component
118121
<td>GO Annotation: Level: cellular_component
119122
<td>Number GOs at top of table
120-
<tr><td>is_a, part_of, replaced_by
121-
<td>GO Annotation: no filters<sup>*</sup>
122-
<td>Table..., Each GO's parents with relations, Export to file, grep <sup>**</sup>
123+
<tr><td>is_a, part_of
124+
<td>GO Annotation: no filters
125+
<td>Export..., Each GO's parents with relations, grep (see footnote<sup>*</sup>)
123126
</table>
124-
<sup>*</sup> Use Clear All
125-
<br><sup>**</sup> From terminal, <tt>grep Is_a AllGoParents.tsv | wc</tt>. Then replace
126-
<tt>is_a</tt> with <tt>part_of</tt>, then with <tt>replaced_by</tt>.
127+
<sup>*</sup> From terminal, '<tt>grep is_a GOeachParents.tsv | wc</tt>'. Repeat with
128+
<tt>is_a</tt> replaced with <tt>part_of</tt>.
127129

128130

129131
<h2>EXPRESSION</h2>
130132
The following sections may not exist if the input had no count files or the DE methods
131133
were not executed.
132-
<table class=y>
134+
<table class=tabley>
133135
<tr><td>TPM*
134136
<td>Filter: select Condition under Exclude; set "At Most" to 1.99.
135137
<td>This will continue using 4.99, 9.99, etc, where the previous results are
@@ -163,8 +165,8 @@ <h2>SEQUENCES</h2>
163165
to the row number that start having #n&gt;10.
164166
</table>
165167

166-
<h3>ORFs stats:</h3>
167-
<table class=y>
168+
<h3>ORF Stats:</h3>
169+
<table class=tabley>
168170
<tr><th>Column<th>Search<th>Obtain number
169171
<tr><td colspan=3><i>The following use the Basic Sequence. Select "TCW" and enter
170172
the Substring indicated.</i>
@@ -194,7 +196,7 @@ <h3>ORFs stats:</h3>
194196
<h3>GC Content:</h3>
195197
The only number reproducible is the GC Content, which is the %GC over the entire sequence..
196198

197-
<table class=y>
199+
<table class=tabley>
198200
<tr><td>GC Content<td>Show All<td>Column:%GC; Stats, column:Average
199201
<br>Note, there will be some slight difference in the number due to round-off error.
200202
</table>
@@ -205,5 +207,6 @@ <h3>GC Content:</h3>
205207
<li>The CpG-O/E is ratio observed/expected [(#CpG/(#G*#C))*length].
206208
<li>The UTRs can be viewed in the Sequence Detail alignments, but there is no column for it.
207209
</ul>
210+
208211
</body>
209212
</html>

java/src/html/viewSingleTCW/reproduce.html

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,17 @@ <h2>Reproduce sTCW overview</h2>
1919
<li>#Seqs is the number of sequences, which is listed at the top of the overview.
2020
<li>"Stats" is the "Show Column Stats" on the "Table..." drop-down.
2121
</ul>
22-
22+
<big><i>Always clear filters before setting new ones!</i></big>
2323
<h3>INPUT</h3>
2424
Most of the input section is data supplied by the user with runSingleTCW. The following
25-
two are computed:
25+
are computed:
2626
<table class=tabley>
27-
<tr><td>Counts: SIZE<td>Show all<td>Column:Counts for all conditions: Stats, column:Sum
27+
<tr><td colspan=3>Counts:
28+
<tr><td>SIZE<td>Show All<td>Column:Counts for all conditions: Stats, column:Sum
29+
<tr><td colspan=3>Sequences:
2830
<tr><td>AVG-len<td>Show All<td>Column:Length; Stats, column:Average
2931
<tr><td>MED-len<td>Show All<td>Column:Length; Stats, column:Median
30-
<br>The median in the two cases will be slightly different
32+
<br>The median in the two cases may be slightly different
3133
because they are computed differently.
3234
</table>
3335

@@ -37,15 +39,14 @@ <h3>Hit Statistics:</h3>
3739
<table class=tabley>
3840
<tr><th>Column<th>Search<th>Obtain number
3941
<tr><td>Sequences with hits<td>Filters: Annotated<td>Number of rows
40-
<tr><td>Unique hits<td>AnnoDB Hits: Seq:None(slow)<sup>*</sup><td>Hits # above table
41-
<tr><td>Total sequence hits<td>AnnoDB Hits: Seq:None(slow)<sup>*</sup><td>Pairs # above table
42+
<tr><td>Unique hits<td>AnnoDB Hits: Seq:None(slow)<td>Hits # above table
43+
<tr><td>Total sequence hits<td>AnnoDB Hits: Seq:None(slow)<td>Pairs # above table
4244
<tr><td>Bases covered by hit<td>AnnoDB Hits: Seq:Best Bits<td>Unselect "Group by Hit ID""; column:Align;
4345
<br>Stats, column:Sum; for NT, multiply by 3
4446
<tr><td>Total bases
4547
<br>(residues for AA seqs)
4648
<td>Show All<td>Column:Length; Stats, column:Sum
4749
</table>
48-
<sup>*</sup> Use Clear All
4950

5051
<h3>AnnoDBs:</h3>
5152

@@ -65,31 +66,33 @@ <h3>AnnoDBs:</h3>
6566
<tr><td>UNIQUE<td>Seq:None(slow)<td>Hits # above table
6667
<tr><td>TOTAL<td>Seq:None(slow) <td>Pairs # above table
6768

68-
<tr><td>AVG-SIM<td>Seq:None(slow)<td>Unselect "Group by Hit ID""; column:%Sim; Stats, column:Average
69+
<tr><td>AVG %SIM<td>Seq:None(slow)<td>Unselect "Group by Hit ID"; column:%Sim; Stats, column:Average
6970

7071
<tr><td colspan=3>Rank=1 is the best hit for a sequence for a given annoDB.
71-
<tr><td>HAS SEQ<td>Seq:Rank=1<td>Seqs # above table; percentage of total #Seqs
72-
<tr><td>AVG-SIM<td>Seq:Rank=1 <td>Uncheck "Group by Hit""; Column:%Sim; Stats, column:Average
73-
<tr><td>Cover &gt;=N<td>Seq:Rank=1,%Sim&gt;=N,%HitCov&gt;=N
74-
<td>Seqs # above table; percentage of HIT-SEQ
72+
<tr><td>HAS HIT<td>Seq:Rank=1<td>Seqs # above table; percentage of total #Seqs
73+
<tr><td>AVG %SIM<td>Seq:Rank=1 <td>Uncheck "Group by Hit"; Column:%Sim; Stats, column:Average
74+
<tr><td>Cover &gt;=N<td>Seq:Rank=1,%Sim&gt;=N,%HitCov<sup>*</sup>&gt;=N
75+
<td>Seqs # above table; percentage of HAS HIT
7576
</table>
76-
HitCov is the difference between the hit stop and start coordinates divided by the length of the protein.
77+
<sup>*</sup>HitCov is the difference between the hit stop and start coordinates divided by the length of the protein.
7778

7879
<h3>Top 15 species from total: N</h3>
7980
The N is the number of unique species based on the first two words of the
80-
species name:
81+
species name. From "AnnoDB Hits":
8182
<ul>
82-
<li>AnnoDB Hits, select Species, select Two words
83-
<li>The number listed beside "Species" is the same as N, and
84-
the species are listed in the table.
83+
<li>Select "Species"", select "Two words"", enter first two words of species name next to "Find", select "Find", select the entry on the
84+
left and add to the right.
85+
<li>Select "Best Bits", "Best Anno" or "None" for the three numbers shown.
86+
<li>BUILD TABLE
87+
<li>Use the number listed beside "Pairs".
8588
</ul>
8689

8790
<h3>Gene Ontology Statistics:</h3>
8891

8992
<table class=tabley>
9093
<tr><th>Column<th>Search<th>Obtain number
9194
<tr><td>Unique GOs
92-
<td>GO Annotation: no filters<sup>*</sup>
95+
<td>GO Annotation: no filters
9396
<td>Results number
9497
<tr><td>Unique hits with GOs
9598
<td>AnnoDB Hits: Seqs:None; GO,etc:Has GO
@@ -111,13 +114,12 @@ <h3>Gene Ontology Statistics:</h3>
111114
<tr><td>cellular_component
112115
<td>GO Annotation: Level: cellular_component
113116
<td>Number GOs at top of table
114-
<tr><td>is_a, part_of, replaced_by
115-
<td>GO Annotation: no filters<sup>*</sup>
116-
<td>Table..., Each GO's parents with relations, Export to file, grep <sup>**</sup>
117+
<tr><td>is_a, part_of
118+
<td>GO Annotation: no filters
119+
<td>Export..., Each GO's parents with relations, grep (see footnote<sup>*</sup>)
117120
</table>
118-
<sup>*</sup> Use Clear All
119-
<br><sup>**</sup> From terminal, <tt>grep Is_a AllGoParents.tsv | wc</tt>. Then replace
120-
<tt>is_a</tt> with <tt>part_of</tt>, then with <tt>replaced_by</tt>.
121+
<sup>*</sup> From terminal, '<tt>grep is_a GOeachParents.tsv | wc</tt>'. Repeat with
122+
<tt>is_a</tt> replaced with <tt>part_of</tt>.
121123

122124

123125
<h2>EXPRESSION</h2>
@@ -157,7 +159,7 @@ <h2>SEQUENCES</h2>
157159
to the row number that start having #n&gt;10.
158160
</table>
159161

160-
<h3>ORFs stats:</h3>
162+
<h3>ORF Stats:</h3>
161163
<table class=tabley>
162164
<tr><th>Column<th>Search<th>Obtain number
163165
<tr><td colspan=3><i>The following use the Basic Sequence. Select "TCW" and enter

java/src/sng/annotator/CoreDB.java

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -194,13 +194,7 @@ public boolean deleteAnnotation (boolean prt)
194194
mDB.tableDelete("tuple_orfs"); // CAS305
195195
mDB.tableDelete("tuple_usage");// CAS305
196196

197-
Out.PrtSpMsg(1, "Remove GO tables...");
198-
mDB.tableDrop("go_info");
199-
mDB.tableDrop("pja_gotree");
200-
mDB.tableDrop("pja_unitrans_go");
201-
mDB.tableDrop("pja_uniprot_go");
202-
mDB.tableDrop("go_term2term");
203-
mDB.tableDrop("go_graph_path");
197+
Schema.dropGOtables(mDB); // CAS332 was dropping from here, and not complete
204198

205199
// CAS331 Created during prune from command-line
206200
mDB.tableDrop(DoUniPrune.tmp_hit);

java/src/sng/annotator/DoUniProt.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,10 @@ private void init() { // CAS326
103103
try {
104104
if (mDB.tableColumnExists("assem_msg", "anno_msg"))
105105
mDB.executeUpdate("update assem_msg set anno_msg=''");
106+
107+
if (!mDB.tableColumnExists("assem_msg", "prune")) // CAS332 - also run in DoUniPrune
108+
mDB.tableCheckAddColumn("assem_msg", "prune", "tinyint default -1", null);
109+
mDB.executeUpdate("update assem_msg set prune = " + pruneType);
106110
}
107111
catch (Exception e) {
108112
pRC = false;

0 commit comments

Comments
 (0)