11.\" ============================================================================
2- .TH swarm 1 "December 12, 2017 " "version 2.2.2 " "USER COMMANDS"
2+ .TH swarm 1 "October 24, 2019 " "version 3.0.0 " "USER COMMANDS"
33.\" ============================================================================
44.SH NAME
55swarm \(em find clusters of nearly-identical nucleotide amplicons
@@ -110,8 +110,9 @@ results obtained during the clustering process allows \fBswarm\fR to
110110avoid most of the amplicon comparisons needed in a naïve approach. To
111111speed up the remaining amplicon comparisons, \fB swarm \fR implements an
112112extremely fast Needleman-Wunsch algorithm making use of the Streaming
113- SIMD Extensions (SSE2) of modern x86-64 CPUs. If SSE2 instructions are
114- not available, \fB swarm \fR exits with an error message.
113+ SIMD Extensions (SSE2) of modern x86-64 CPUs, or NEON instructions of
114+ ARM-64 CPUs. If SSE2 instructions are not available, \fB swarm \fR exits
115+ with an error message.
115116.PP
116117\fB swarm \fR can read nucleotide amplicons in fasta format from a
117118normal file or from the standard input (using a pipe or a
@@ -138,7 +139,19 @@ defined as a string of [ACGT] or [ACGU] symbols (case insensitive, 'U'
138139is replaced with 'T' internally), starting after the end of the header
139140line and ending before the next header line or the file end;
140141\fB swarm \fR silently removes newline symbols ('\\n ' or '\\r ') and
141- exits with an error message if any other symbol is present.
142+ exits with an error message if any other symbol is present. Lastly, if
143+ sequences are not all unique, i.e. were not properly dereplicated,
144+ swarm will exit with an error message.
145+ .PP
146+ Clusters are written to output files (specified with \- i, \- o, \- s and
147+ \- u) by decreasing abundance of their seed sequences, and then by
148+ alphabetical order of seed sequence labels. An exception to that is
149+ the \- w (\-\- seeds) output, which is sorted by decreasing \fI cluster
150+ abundance \fR (sum of abundances of all sequences in the cluster), and
151+ then by alphabetical order of seed sequence labels. This is
152+ particularly useful for post-clustering steps, such as \fI de novo \fR
153+ chimera detection, that require clusters to be sorted by decreasing
154+ abundances.
142155.\" ----------------------------------------------------------------------------
143156.SS General options
144157.TP 9
@@ -286,7 +299,7 @@ in situations where writing to \fIstandard error\fR is problematic
286299output clustering results to \fI filename \fR . Results consist of a list
287300of OTUs, one OTU per line. An OTU is a list of amplicon headers
288301separated by spaces. That output format can be modified by the option
289- \-\- mothur (\- r). Default is to write to standard output.
302+ \-\- mothur (\- r). Default is to write to \fI standard output \fR .
290303.TP
291304.B \- r\fP,\fB\ \-\- mothur
292305output clustering results in a format compatible with Mothur. That
@@ -305,7 +318,7 @@ total abundance of amplicons in the OTU,
305318.IP \n+ [ step ].
306319label of the initial seed (header without abundance annotations),
307320.IP \n+ [ step ].
308- initial seed abundance ,
321+ abundance of the initial seed,
309322.IP \n+ [ step ].
310323number of amplicons with an abundance of 1 in the OTU,
311324.IP \n+ [ step ].
@@ -363,13 +376,15 @@ output OTU representative sequences to \fIfilename\fR in fasta
363376format. The abundance value of each OTU representative is the sum of
364377the abundances of all the amplicons in the OTU. Fasta headers are
365378formated as follows: '>label_\fI integer \fR ',
366- or '>label;size=\fI integer \fR ;' if the \- z option is used.
379+ or '>label;size=\fI integer \fR ;' if the \- z option is used, and
380+ sequences are uppercased. Sequences are sorted by decreasing
381+ abundance, and then by alphabetical order of sequence labels.
367382.TP
368383.B \- z\fP,\fB\ \-\- usearch\- abundance
369384accept amplicon abundance values in usearch/vsearch's style
370385(>label;size=\fI integer \fR [;]). That option influences the abundance
371- annotation style used in swarm's standard output (\- o), as well as the
372- ouput of options \- r, \- u and \- w.
386+ annotation style used in swarm's \fI standard output \fR (\- o), as well
387+ as the output of options \- r, \- u and \- w.
373388.LP
374389.\" ----------------------------------------------------------------------------
375390.SS Pairwise alignment advanced options
@@ -410,7 +425,7 @@ zcat myfile.fasta.gz | \\
410425 \- t 4 \\
411426 \- f \\
412427 \- w myfile.representatives.fasta \\
413- \- o myfile.swarms
428+ \- o /dev/null
414429.RE
415430 .EE
416431.\" ============================================================================
@@ -475,7 +490,7 @@ License along with this program. If not, see
475490.\" ============================================================================
476491.SH SEE ALSO
477492\fB swipe \fR , an extremely fast Smith-Waterman database search tool by
478- Torbjørn Rognes (available from
493+ Torbjørn Rognes (available at
479494.UR https://github.com/torognes/swipe
480495.UE ).
481496.PP
@@ -492,8 +507,17 @@ New features and important modifications of \fBswarm\fR (short lived
492507or minor bug releases are not mentioned):
493508.RS
494509.TP
510+ .BR v3.0.0\~ " released October 24, 2019"
511+ Version 3.0.0 introduces a faster algorithm for \fI d \fR = 1, and a
512+ reduced memory footprint. Swarm has been ported to Windows x86-64,
513+ GNU/Linux ARM 64, and GNU/Linux POWER8. Internal code has been
514+ modernized, hardened, and thoroughly tested. Strict dereplication of
515+ input sequences is now mandatory. The \-\- seeds option (\- w) now
516+ outputs results sorted by decreasing abundance, and then by
517+ alphabetical order of sequence labels.
518+ .TP
495519.BR v2.2.2\~ " released December 12, 2017"
496- Version 2.2.2 fixes a bug that would cause Swarm to wait forever in
520+ Version 2.2.2 fixes a bug that would cause swarm to wait forever in
497521very rare cases when multiple threads were used.
498522.TP
499523.BR v2.2.1\~ " released October 27, 2017"
@@ -527,7 +551,7 @@ bug only applies when \fId\fR > 1.
527551.BR v2.1.10\~ " released December 22, 2016"
528552Version 2.1.10 fixes two bugs related to gap penalties of alignments.
529553The first bug may lead to wrong aligments and similarity percentages
530- reported in UCLUST (.uc) files. The second bug makes Swarm use a
554+ reported in UCLUST (.uc) files. The second bug makes swarm use a
531555slightly higher gap extension penalty than specified. The default gap
532556extension penalty used have actually been 4.5 instead of 4.
533557.TP
@@ -679,10 +703,10 @@ not. Only basic SSE2 instructions are now required to run \fBswarm\fR.
679703.TP
680704.BR v1.2.4\~ " released January 30, 2014"
681705Version 1.2.4 introduces an option \-\- break\- swarms to output all
682- pairs of amplicons with \fI d \fR differences to standard error. That
683- option is used by the companion script `swarm_breaker.py` to refine
684- \fB swarm \fR results. The syntax of the inline assembly code is changed
685- for compatibility with more compilers.
706+ pairs of amplicons with \fI d \fR differences to \fI standard
707+ error \fR . That option is used by the companion script
708+ `swarm_breaker.py` to refine \fB swarm \fR results. The syntax of the
709+ inline assembly code is changed for compatibility with more compilers.
686710.TP
687711.BR v1.2\~ " released May 16, 2013"
688712Version 1.2 greatly improves speed by using alignment-free comparisons
0 commit comments