Skip to content

Commit 5dfef18

Browse files
committed
v3.3.4
1 parent e92a4b6 commit 5dfef18

File tree

81 files changed

+2303
-1836
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+2303
-1836
lines changed

doc/Release.html

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,73 @@
66
<a id=top>
77
<!--#include virtual="./ssi/start1.html" -->
88

9+
<h4>v3.3.4 18-Oct-2021</h4>
10+
This release is improvements for the <ttl>ORF finder</ttl>.
11+
<p><ttp>runSingleTCW</ttp>
12+
<ul>
13+
<li>Changes to ORF finder:
14+
<ul>
15+
<li>Algorithm: (the changes are fine-tuning results, no major changes)
16+
<ul>
17+
<li><u>For Markov training</u>: it was using all sequences for training. Now it uses the longest N sequences
18+
(default 2000), after removing similar sequences. N can be changed by executing <ttc>execAnno</ttc> with the
19+
-t option.
20+
<li><u>Sort</u>: (1) Instead of comparing the Markov scores, it tests against (Abs(log(score1)-log(score2)&gt;0.3), where negative
21+
markov scores are -(log(-score1)); (2) If the length and Markov scores are similar, a 4th rule checks for ends (Start & Stop codon).
22+
<li><u>With Hit</u>: If the hit ends at start/stop codons, always use those coordinates. Otherwise, it finds all possible ORFs (including Stop to Stop),
23+
and sorts for the best ORF.
24+
<li><u>Stop codons</u>: If there was stop codons in the hit, it was not always finding the best coordinates - now it does.
25+
<li><u>No hit</u>: ORFs that are Stop to Stop may now be considered if the Stop is far enough from the last Start.
26+
<li><u>N's in sequence</u>: It use to try to avoid N's, now it does not. However, it does remove them from the length before taking
27+
the log for comparing lengths.
28+
<li><u>Minimal sequences for Markov training</u>: The default was 50, which is way too low. It is now 500.
29+
</ul>
30+
31+
<li>The selected ORFs that do not have both Start and/or Stop will have a remark.
32+
<li>The output files are now sorted by SeqID.
33+
<li>Previously, allGoodORFs.pep.fa + bestORFs.pep.fa provided all candidate ORFs; now all
34+
candidate ORFs are in allGoodORFs.pep.fa.
35+
36+
</ul>
37+
38+
</ul>
39+
40+
<ttp>viewSingleTCW </ttp>
41+
<ul>
42+
<li><ttc>Load File</ttc> for all 3 Basic filters:
43+
only the first word per line will be read as a SeqID, OrigID, HitID or GOID.
44+
This allows files to be used that have other information on each line.
45+
<ul>
46+
<li><ttc>Basic GO annotations</ttc>: Add "#" before column headings so an exported file can later be read in
47+
(The other Exports already do this).
48+
</ul>
49+
<li><font color=green>New</font> <ttc>Sequence Table</ttc> has new Export option to output the columns
50+
of the selected row.
51+
<li><ttc>Basic Sequence</ttc>:
52+
<ul>
53+
<li><font color=green>New</font> Select one sequence from the table followed by <ttc>Seq Detail</ttc> to see the <ttc>Sequence Detail</ttc>
54+
panel. This is in contrast to the <ttc>Seq Table</ttc> ,which results the sequences being shown in the
55+
<ttc>Sequence Table</ttc>.
56+
<li><font color=green>New</font> The result of a search will <ttc>SELECT ROWS</ttc> from the existing table.
57+
</ul>
58+
59+
60+
<li><ttc>Results</ttc>: This panel now shows all <ttc>Sequence Detail</ttc> labels from the left panel
61+
so that all results can easily be removed.
62+
<li>Changed a few labels, e.g. the "View Seqs" label to "Seq Table", and "View Selected Sequence" to "Seq Detail"
63+
</ul>
64+
65+
Bug fixes:
66+
<ul>
67+
<li>ORF finder: Sequence of length 0 crashed the ORF Finder.
68+
<li><ttc>Basic GO annotations</ttc>: the <ttl>#Seqs</ttl> quit showing the number of DE seqs correctly (bug from v3.3.3).
69+
<li><ttc>Filter ORF Frame</ttc>: Only worked for positive frames.
70+
</ul>
71+
Other
72+
<ul>
73+
<li>demoTra: add N's to a few of the sequences.
74+
</ul>
75+
976
<h4>v3.3.3 25-Sept-2021</h4>
1077

1178
<ttp>viewSingleTCW</ttp> - tiny fixes

doc/ov/sDemo.html

Lines changed: 54 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@
55
<h2>TCW overview for tra </h2>
66
<table width=700 border=1><tr><td>
77
<pre>
8-
Project: tra #Seqs: 211 #Hits: 12,267 #GOs: 6,651 TPM Seq-DE GO-Enrich Pairs
8+
Project: tra #Seqs: 211 #Hits: 12,259 #GOs: 6,641 TPM Seq-DE GO-Enrich Pairs
99

10-
26-Aug-21 Build Database sequences loaded from external source
11-
26-Aug-21 Last Annotation with sTCW v3.3.1
10+
01-Oct-21 Build Database sequences loaded from external source
11+
15-Oct-21 Last Annotation with sTCW v3.3.4
1212

1313
INPUT
1414
Counts:
@@ -24,18 +24,18 @@ <h2>TCW overview for tra </h2>
2424
ANNOTATIONS
2525
Hit Statistics:
2626
Sequences with hits 210 (99.5%) Bases covered by hit 180,582 (79.3%)
27-
Unique hits 12,267 Total bases 227,683
28-
Total sequence hits 18,013
27+
Unique hits 12,259 Total bases 227,683
28+
Total sequence hits 17,968
2929

3030
annoDBs (Annotation databases): 7 (see Legend below)
3131
ANNODB ONLY BITS ANNO UNIQUE TOTAL AVG Rank HAS (%Seqs) AVG COVER COVER
3232
%SIM =1 HIT %SIM >=50 >=90
33-
SP-plants 0 0 0 1,320 2,956 53.2 | 193 (91.5%) 69.6 61.1% 3.1%
34-
SP-invertebrates 0 0 0 610 1,252 42.0 | 126 (59.7%) 47.0 29.4% 0.8%
35-
SP-fungi 0 0 0 850 1,459 42.9 | 118 (55.9%) 46.4 24.6% 0.8%
36-
SP-bacteria 0 0 0 786 1,121 43.7 | 63 (29.9%) 45.4 25.4% 0%
37-
SP-fullSubset 0 0 0 1,261 2,291 44.2 | 137 (64.9%) 48.1 32.1% 0.7%
38-
TR-plants 9 207 207 4,535 5,217 75.5 | 210 (99.5%) 81.4 68.6% 8.6%
33+
SP-plants 0 0 0 1,322 2,942 53.3 | 193 (91.5%) 69.6 61.7% 3.1%
34+
SP-invertebrates 0 0 0 610 1,249 42.0 | 126 (59.7%) 47.0 30.2% 0.8%
35+
SP-fungi 0 0 0 846 1,452 42.9 | 118 (55.9%) 46.4 24.6% 0.8%
36+
SP-bacteria 0 0 0 782 1,121 43.7 | 63 (29.9%) 45.4 25.4% 0%
37+
SP-fullSubset 0 0 0 1,259 2,270 44.3 | 137 (64.9%) 48.1 32.1% 0.7%
38+
TR-plants 9 207 207 4,535 5,217 75.5 | 210 (99.5%) 81.4 69.5% 8.6%
3939
TR-invertebrates 0 3 3 2,905 3,717 49.5 | 179 (84.8%) 54.0 39.1% 2.2%
4040

4141
Top 15 species from total: 1,508
@@ -47,16 +47,16 @@ <h2>TCW overview for tra </h2>
4747
Phoenix dactylifera 9 19 318 Macleaya cordata 2 2 41
4848
Zingiber officinale 7 7 27 Nelumbo nucifera 1 3 95
4949
Ananas comosus 4 11 458 Ricinus communis 1 2 43
50-
Meloidogyne enterolobii 3 0 10 Other 12 20 15,492
50+
Meloidogyne enterolobii 3 0 10 Other 12 20 15,447
5151

5252
Gene Ontology Statistics:
53-
Unique GOs 6,651 Unique hits with GOs 10,670 (87.0%)
53+
Unique GOs 6,641 Unique hits with GOs 10,662 (87.0%)
5454
Sequences with GOs 208 (98.6%) Seq best hit has GOs 194 (91.9%)
5555
Has goslim_plant 95
5656

57-
biological_process 4,879 (73.4%) is_a 10,776
58-
molecular_function 1,026 (15.4%) part_of 1,176
59-
cellular_component 746 (11.2%)
57+
biological_process 4,870 (73.3%) is_a 10,764
58+
molecular_function 1,026 (15.4%) part_of 1,175
59+
cellular_component 745 (11.2%)
6060

6161
EXPRESSION
6262
TPM: (% of 211)
@@ -68,37 +68,43 @@ <h2>TCW overview for tra </h2>
6868
Differential expression: (% of 211)
6969
<1E-5 <1E-4 <0.001 <0.01 <0.05
7070
RoSt 6 (3%) 17 (8%) 32(15%) 65(31%) 97(46%)
71-
RoOl 1(<1%) 16 (8%) 37(18%) 74(35%) 110(52%)
72-
StOl 0 (0%) 2 (1%) 12 (6%) 65(31%) 101(48%)
71+
RoLe 1(<1%) 16 (8%) 37(18%) 74(35%) 110(52%)
72+
StLe 0 (0%) 2 (1%) 12 (6%) 65(31%) 101(48%)
7373

74-
Gene ontology enrichment: (% of 6,651)
74+
Gene ontology enrichment: (% of 6,641)
7575
<1E-5 <1E-4 <0.001 <0.01 <0.05
76-
RoSt 0 (0%) 0 (0%) 1(<1%) 51 (1%) 309 (5%)
77-
RoOl 0 (0%) 0 (0%) 2(<1%) 22(<1%) 166 (2%)
78-
StOl 0 (0%) 0 (0%) 0 (0%) 1(<1%) 48 (1%)
76+
RoSt 0 (0%) 0 (0%) 1(<1%) 53 (1%) 312 (5%)
77+
RoLe 0 (0%) 0 (0%) 2(<1%) 21(<1%) 165 (2%)
78+
StLe 0 (0%) 0 (0%) 0 (0%) 1(<1%) 49 (1%)
7979

8080
SEQUENCES
8181
Sequence lengths:
8282
<=100 101-500 501-1000 1001-2000 2001-3000 3001-4000 4001-5000 >5000
8383
0(0%) 37(18%) 84(40%) 72(34%) 8(4%) 8(4%) 2(1%) 0(0%)
8484

85+
Quality:
86+
Sequences with #n>0: 3 ( 1.4%)
87+
Sequences with #n>10: 1 ( 0.5%)
88+
8589
ORF lengths:
8690
<=100 101-500 501-1000 1001-2000 2001-3000 3001-4000 4001-5000 >5000
87-
0(0%) 61(29%) 92(44%) 48(23%) 4(2%) 6(3%) 0(0%) 0(0%)
91+
0(0%) 62(29%) 92(44%) 47(22%) 4(2%) 6(3%) 0(0%) 0(0%)
8892

89-
ORF Stats: Average length 861
90-
Has Hit 210 (99.5%) ORF=Hit 119 (56.4%)
91-
Is Longest ORF 200 (94.8%) ORF>=300 191 (90.5%) MultiFrame 3 (1.4%)
92-
Markov Best Score 211 (100.0%) Has Start|Stop 162 (76.8%) Stops in Hit 6 (2.8%)
93-
All of the above 200 (94.8%) Has Start&Stop 53 (25.1%) >=9 Ns in ORF 0 (0%)
93+
ORF Stats: Average length 863
94+
Has Hit 210 (99.5%) Both Ends 61 (28.9%)
95+
Is Longest ORF 190 (90.0%) ORF>=300 190 (90.0%) MultiFrame 3 (1.4%)
96+
Markov Best Score 211 (100.0%) ORF=Hit 119 (56.4%) Stops in Hit 6 (2.8%)
97+
All of the above 190 (90.0%) with Ends 31 (14.7%) >=9 Ns in ORF 1 (<1%)
9498

9599
GC Content: 48.65%
96100
Pos1 18.3% 5UTR CDS 3UTR 5UTR CDS 3UTR
97-
Pos2 13.7% %GC 43.71 48.80 36.27 Length 21k 182k 25k
98-
Pos3 16.6% CpG-O/E 0.87 0.71 0.63 AvgLen 101.2 860.8 117.1
101+
Pos2 13.7% %GC 43.68 48.79 36.25 Length 21k 182k 25k
102+
Pos3 16.6% CpG-O/E 0.88 0.71 0.63 AvgLen 98.9 863.1 117.1
99103

100-
Similar pairs: 20
101-
Nucleotide 20
104+
Similar pairs: 100
105+
Nucleotide 17
106+
Translated nucleotide 100
107+
Translated ORFs 95
102108

103109
LOCATIONS
104110
Sequences with location: 12 unique locations: 12
@@ -112,37 +118,37 @@ <h2>TCW overview for tra </h2>
112118
PROCESSING INFORMATION:
113119
AnnoDB Files:
114120
Type Taxo FILE DB DATE ADD DATE EXECUTE
115-
sp plants uniprot_sprot_plants.fasta 27-Jan-21 26-Aug-21 diamond --masking 0
116-
sp invertebrates uniprot_sprot_invertebrates.fasta 27-Jan-21 26-Aug-21 diamond --masking 0
117-
sp fungi uniprot_sprot_fungi.fasta 27-Jan-21 26-Aug-21 diamond --masking 0
118-
sp bacteria uniprot_sprot_bacteria.fasta 27-Jan-21 26-Aug-21 diamond --masking 0
119-
sp fullSubset uniprot_sprot_fullSubset.fasta 27-Jan-21 26-Aug-21 diamond --masking 0
120-
tr plants uniprot_trembl_plants.fasta 27-Jan-21 26-Aug-21 diamond --masking 0
121-
tr invertebrates uniprot_trembl_invertebrates.fasta 27-Jan-21 26-Aug-21 diamond --masking 0
121+
sp plants uniprot_sprot_plants.fasta 27-Jan-21 15-Oct-21 diamond --masking 0
122+
sp invertebrates uniprot_sprot_invertebrates.fasta 27-Jan-21 15-Oct-21 diamond --masking 0
123+
sp fungi uniprot_sprot_fungi.fasta 27-Jan-21 15-Oct-21 diamond --masking 0
124+
sp bacteria uniprot_sprot_bacteria.fasta 27-Jan-21 15-Oct-21 diamond --masking 0
125+
sp fullSubset uniprot_sprot_fullSubset.fasta 27-Jan-21 15-Oct-21 diamond --masking 0
126+
tr plants uniprot_trembl_plants.fasta 27-Jan-21 15-Oct-21 diamond --masking 0
127+
tr invertebrates uniprot_trembl_invertebrates.fasta 27-Jan-21 15-Oct-21 diamond --masking 0
122128

129+
Prune: none
123130

124-
Gene Ontology: go-basic.obo-Feb2021 GOdb: go_demo [GOs added with sTCW v3.3.1]
131+
Gene Ontology: go-basic.obo-Feb2021 GOdb: go_demo [GOs added with sTCW v3.3.4]
125132
GO Slim: goslim_plant
126133

127134
ORF finder:
128135
Use ATG only for start site
129136
Rule 1: Use Good hit: E-value <=1E-10 or Sim >= 20%
130-
Rule 2: Use longest ORF if Log Len Ratio > 0.5
131-
Rule 3: Use best Markov score
132-
Train using best hits
133-
Good coverage: Hit overlap >= 95% with Sim 60% (internal params)
137+
Rule 2: Use longest ORF if Log Ratio > 0.5
138+
Rule 3: Use best Markov score if Log Ratio > 0.4
139+
Train using best hits (204 seqs, 174.4k bases)
134140

135141
Differential Expression computation:
136142
Column Method Conditions
137143
RoSt edgeRglm.R CPM>1>=2 Root : Stem
138-
RoOl edgeRglm.R CPM>1>=2 Root : Oleaf
139-
StOl edgeRglm.R CPM>1>=2 Stem : Oleaf
144+
RoLe edgeRglm.R CPM>1>=2 Root : Oleaf
145+
StLe edgeRglm.R CPM>1>=2 Stem : Oleaf
140146

141147
GO enrichment computation:
142148
Column Method Cutoff
143149
RoSt goSeqNoFDR.R 5.0e-02
144-
RoOl goSeqNoFDR.R 5.0e-02
145-
StOl goSeqNoFDR.R 5.0e-02
150+
RoLe goSeqNoFDR.R 5.0e-02
151+
StLe goSeqNoFDR.R 5.0e-02
146152

147153
-------------------------------------------------------------------
148154
LEGEND:

0 commit comments

Comments
 (0)