Releases · MikkelSchubert/adapterremoval

19 May 18:55

v3.0.0-alpha3

bca19b1

AdapterRemoval v3.0.0-alpha3 Pre-release

Pre-release

This is the third alpha release of AdapterRemoval v3. As with the previous alpha
releases, changes that affect how AdapterRemoval is used (e.g. by removing
options) or that result in different output compared to previous versions are
marked with the label "[BREAKING]".

AdapterRemoval now uses meson for its build process, and meson is therefore
a build-time requirement. A Makefile is still provided to simplify setting up
and running the build. See the installation instructions in the documentation
for more information.

Major changes include support for hardware accelerated alignments using NEON on
modern Apple hardware, support for samples being identified by multiple
barcodes, support for handling barcodes in that may ligate in different
orientations, improved support for SAM/BAM output, and (optional) duplication
plot in HTML report.

For more information, see
https://github.com/MikkelSchubert/adapterremoval/blob/v3.0.0-alpha3/README-v3.md

Added

Multiple barcodes/barcode pairs may now be used to identify the same sample,
via the --multiple-barcodes flag. The number of hits per barcode/barcode
pair is reported in the HTML/JSON reports.
Added support for handling barcodes that may ligate in different orientations
(via --barcode-orientation) and for normalizing the orientation of merged
reads (via --normalize-orientation).
The --use-colors parameter may now be used to controls color output.
Options are auto (default; enabled when run interactively), always, or never.
The title of the HTML report can now be set via --report-title.
Input files are now checked for duplicate filenames, in order to help
prevents accidental data duplication.
Alignments are now accelerated on Apple hardware using NEON instructions, for
a roughly 3-fold increase in throughput.
A duplication plot is now included in the HTML report if this is enabled,
instead of only being reported in the JSON file.

Changed

[BREAKING] Changed CO tags for read-groups in SAM/BAM files to DS
(description) tags, in order to match the specification.
[BREAKING] A number of changes have been made to the JSON report layout,
including the moving, removal, and addition of sections. The layout is
described in schema.json.
[BREAKING] The minimum allowed/default value for --min-adapter-overlap
was set to 1. In practice this has no effect, since length 0 alignments were
never considered, but may break scripts running AdapterRemoval.
[BREAKING] Drop support for raw error-rates to --trim-mott-rate, which
was renamed to --trim-mott-quality to match other trimming options.
[BREAKING] SAM/BAM output is now combined into a single file by default,
including discarded reads. This can be overridden by setting the individual
--out-* options.
[BREAKING] Dropped PG tag from read-groups/records in SAM/BAM output.
[BREAKING] Dropped (minimal) read-groups for SAM/BAM output. If desired,
read-group information can be added with --read-group.
[BREAKING] The --report-duplication option now supports k/m/g suffixes,
and defaults to 100k if used without an explicit value.
[BREAKING] The --read-group option no longer attempts to unescape
special characters. Instead, tags must be separated using embedded tabs
(--read-group $'ID:A\tSM:B') or provided as individual arguments
(--read-group 'ID:A' 'SM:B').
Improved checks for conflicting command-line options.
Barcodes are now recorded in FASTQ headers demultiplexing without trimming.
The $schema URL is now included in the JSON report
Makefile features are now enabled/disable with true/false instead of
yes/no.
Vega-lite is now loaded in the background, when opening the HTML reports,
making the report readable before Vega-lite has loaded.
Optimized alignments involving multiple possible adapter sequences, by only
once performing the alignments that involve no adapter sequences.
Optimized alignments involving multiple possible adapter sequences, by
sorting the list of adapter sequences by hits. This increasing the odds that
a good alignment is found early so that worse alignments can be skipped.
The old Makefile was replaced with the Meson build system, but a wrapper
Makefile is still provided/used as a convenience for setting the recommended
build options.
A number of small improvements were made to the --help text.
Improved error messages when mismatching (paired) read names are detected.
Singleton reads are now included in the overall summary statistics in
JSON/HTML reports.
Hardening flags are now enabled by default during compilation. This comes with
a small performance cost, but most distros are also expected to enable similar
flags by default.

Fixed

NA values were being written with '%' or 'bp' suffixes in HTML report.
Some plots were omitted for merged reads in HTML report.
Mate 2 adapters were reverse-complemented in JSON report when demultiplexing.
SAM/BAM headers were not being written in demultiplexing mode.
Mate 1/2 statistics were sampled independently, and thus potentially not
derived from the same read pairs.
The JSON/HTML reports would give different time-stamps for the run, since one
gave the start time and the other the end time. Now start time is always used.
Fixed failure when reading paired FASTQ files where read lengths differed
between the two files
Fixed report files and unidentified reads getting additional suffixes when
filenames were specified manually during demultiplexing.
Fixed /dev/null being listed as the path for some files when demultiplexing,
and these outputs were disabled.
Reverted the removal of support for '.' as equivalent to 'N' in FASTQ reads.
This is found in some older data-sets (#112).
Fixed misleading IO error messages, that would include descriptions of
unrelated errors in some cases.

Assets 3

24 Aug 15:25

MikkelSchubert

v2.3.4

79b296b

AdapterRemoval v2.3.4 Latest

Latest

This release adds a new couple of command-line options for handling non-ACGTN
bases in FASTQ data and back-ports a few minor fixes from the development
branch.

Added

Added support for converting Uracils (U) in input data to Thymine (T) via the
--convert-uracils flag.
Added support for replacing IUPAC-encoded degenerate bases with Ns via the
--mask-degenerate-bases flag.
Added DESTDIR support to make install.

Fixed

Improved progress timer accuracy, so updates occur closer to every 1M reads.

Changed

Minor improvements to --help text and documentation.

Assets 2

20 Aug 14:09

MikkelSchubert

v3.0.0-alpha2

bd42701

AdapterRemoval v3.0.0-alpha2 Pre-release

Pre-release

This is the second alpha release of AdapterRemoval v3. It is the intention that
a third alpha release, or the final 3.0 release, will follow within the next
couple of months.

As with alpha 1, changes that affect how AdapterRemoval is used (e.g. by
removing options) or that result in different output compared to AdapterRemoval
v2 are marked with the label "[BREAKING]".

In addition to changes listed below, this release includes increased throughput
thanks to improved parallelization of various steps in internal pipeline,
support for AVX512 and general improvements to the SIMD alignment algorithms,
loop unrolling of non-SIMD alignments to significantly increase throughput when
SIMD is not available, and a significant decrease in the number of allocations
to decrease overhead.

This release requires a compiler with support for c++17 and libdeflate is now a
mandatory dependency.

Draft documentation is available here and a pre-compiled binary for x86-64
Linux systems is attached below.

Added

Added support for converting (U)racils in input data to T(hymine) via the
--convert-uracils flag.
Added support for replacing IUPAC-encoded degenerate bases with Ns via the
--mask-degenerate-bases flag.
Added support for writing output in SAM/BAM formats, with optional
user-supplied read-group information.
Added support for alignments using AVX512 instructions. AVX512 support only
available when AdapterRemoval is compiled with GCC v11+ or Clang v8+.
Added support selecting output file formats via the file extension and via
the --out-format option. A corresponding option, --stdout-format was
added to select the format for data written to STDOUT.
Added support for reading from STDIN or writing to STDOUT when '-' is used as
the filename, as an alternative to using /dev/stdin or /dev/stdout.
Added dedicated threads solely for writing output data. This allows compute
threads to work at full capacity, as long as the destination can consume
written data fast enough. This may result in CPU utilization exceeding
--threads by a couple of percent.
Added support for setting DESTDIR when running make install.
Added --licenses flag for displaying licenses of 3rd party code used by /
incorporated into AdapterRemoval.
Added --simd option allowing the user to select the specific SIMD
instruction set they wish to use.
Added Containerfile for building static binaries using alpine/musl.

Changed

[BREAKING] Changed the default --mm/--mismatch-rate from 1/3 to 1/6,
in order to decrease the false positive rate, in particular for read merging.
[BREAKING] Default to writing gzip-compressed FASTQ files; output written
to STDOUT is uncompressed by default.
[BREAKING] Discarded reads are no longer saved by default.
[BREAKING] Output files for discarded reads and singleton (orphan)
paired-end reads are only created if filtering is enabled.
[BREAKING] The --basename / --out-prefix no longer defaults to
your_output. Instead the user is required to set at least one --out-*
option.
[BREAKING] Merged --identify-adapters and --report-only commands. The
adapter sequence is presently only reported in the HTML report, but will be
added to the JSON report following some planned changes.
[BREAKING] Reverted --min-complexity being enabled by default.
Increased the default --threads value to 2.
A number of command-line options were renamed for consistency; use of the old
names is still supported, but will trigger a warning message.
Re-organized compression: level 1 is streamed using isa-l, while levels 2-13
correspond to libdeflate levels 1 to 12.
Changed the default compression level to 5 on the new scale (libdeflate level
4); this results in a ~40% increase in throughput at the cost of roughly ~3%
larger output files.
Setting an --out-* option in demultiplexing mode overrides the basename /
prefix for that specific output type.
Add smoothing to GC values calculated for the GC content curve, to account
for the fact that possible GC% values are unevenly distributed depending on
the read length.

Removed

The following changes are all [BREAKING] as described above:

Removed support for original merging algorithm has been removed. The
--merge-strategy additive method produces very similar, but slightly more
conservative scores.
Removed the ability to randomly sample a base if no best base could be
selected in case of mismatches. Such bases are now changed to N, while both
methods assign a Phred score of 0 (!).

Assets 3

07 Nov 19:07

MikkelSchubert

v3.0.0-alpha1

3e6e49e

AdapterRemoval v3.0.0-alpha1 Pre-release

Pre-release

This is the first alpha release of AdapterRemoval v3. This is a major revision
of AdapterRemoval, with the goals of simplify usage by picking a sensible set of
default settings, adding new features to handle a wider range of data, providing
human/machine readable reports, and improving overall throughput.

This release features a number of breaking changes compared to AdapterRemoval v2
and it is therefore recommended that you carefully read the list of changes
below. Changes that affect how AdapterRemoval is used (e.g. by removing options)
or that result in different output compared to AdapterRemoval v2 are marked with
the label "[BREAKING]".

This is an alpha release; not all planned features are complete (more QC reports
are planned among other things), additional optimizations will be attempted, and
documentation is still needs to be expanded further before the final release.
Feedback is very welcome in the mean time.

Draft documentation is available here and a pre-compiled binary for x86-64 Linux systems is attached below.

Added

Reports are now available in JSON format for easy parsing and in HTML format
for human consumption. These replace the old --settings file.
AVX2 enabled alignment algorithm for a significant performance boost (YMMV).
Added support for detecting supported CPU extensions (SSE/AVX) at runtime.
Support for combining output by simply by specifying the same filename for for
multiple outputs types, e.g. --output1 file.fq --output2 file.fq will for
example produce interleaved output.
Added handling for /dev/null as a "magic" output filename. Read-types
writing to this exact path will be discarded early in the pipeline, saving
time previously spent processing, compressing, and writing FASTQ reads.
Added read complexity filter inspired by [fastp].
Added the ability to only processes the first N reads/read pairs via the
newly added --head N command-line option.
Added estimation of duplication rates based on the [FastQC] algorithm.
Automatic detection of mate separators based on the first chunk of reads
processed. The --mate-separator is therefore only required in cases where
the results are ambiguous.
Automatic gzip compression of output files with a .gz extension. This makes
it possible to compress only a subset of files and removes the need for the
--gzip option when manually specifying output files.
Added options --prefix-read1, --prefix-read2, and --prefix-merged for
adding custom prefixes to the names of FASTQ reads.

Changed

[BREAKING] Default adapters have been changed to the [recommended Illumina
sequences], equivalent to the first 33 bp of the adapter sequences used by
AdapterRemoval v2. This makes the default settings more generally applicable.
[BREAKING] The trimming options --trimwindows, --trimns,
--trimqualities, and --minquality have been deprecated in favor of a new
the modified Mott's algorithm, which is enabled by default. The trimming
algorithm used may be changed using new --trim-strategy option.
[BREAKING] Merging now defaults to using the conservative algorithm,
meaning that matching quality scores are assigned Q_match = max(Q_a, Q_b)
instead of Q_match ~= Q_a + Q_b, and that same-quality mismatches are
assigned 'N' instead of one being picked at random. Motivated in part by
doi:10.1186/s12859-018-2579-2. This can be changed using --merge-strategy.
The --merge option no longer has any effect when processing SE data;
previously this option would treat reads with at --minalignmentlength
adapter as pseudo-merged reads.
[BREAKING] Merged reads are no longer given a M_ name prefix and merged
reads that have been trimmed after merging are no longer given an MT_ name
prefix. Instead, see the new option --prefix-merged.
[BREAKING] Default filenames have all been revised and now include proper
extensions to indicate the format.
[BREAKING] The executable is now named adapterremoval3. This was done to
allow v3 to coexist with AdapterRemoval v2 and to prevent accidental use of
the wrong version.
[BREAKING] Changed the default --maxns value from 1000 to "infinite"
--gzip now defaults to compressing independent blocks of 64kb data using
libdeflate. This significantly improves throughput in both single- and
(especially) multi-threaded mode, but may be incompatible with a few programs.
Compression levels of 3 and below use isa-l for compression and provides a
more universally compatible output.
The term "merging" is now used consistently instead of "collapsing", including
for default output filenames. Options have been renamed, but old option names
continue to work (except for --outputcollapsedtruncated).
Improvements to alignment algorithm in order to terminate early if possible.
Logging is now done more consistently and exposes options to increase or
decrease the amount of messages printed (debug, info, warning, errors).

Removed

The following changes are all [BREAKING] as described above:

The --outputcollapsedtruncated has been removed and all merged reads
(whether quality trimmed or not) are simply written to --outputmerged.
The --qualitybase-output has been removed. Output is now always Phred+33.
The --combined-output option has been removed in favor of allowing arbitrary
merging of output files (see above).
The --settings option has been replaced by --out-json and --out-html for
machine and human readable reports, respectively.
Removed support for guessing the intended command-line argument based on
prefixes. I.e. --th will no longer be accepted for --threads. Due to the
number of options added, removed, and renamed, this is no longer reliable.
The deprecated --pcr1 and --pcr2 options have been removed.
Dropped undocumented support for '.' as equivalent to 'N' in FASTQ reads.
Support for reading and writing of bzip2 files has been removed.

Assets 3

15 Apr 09:33

MikkelSchubert

v2.3.3

54bc0d6

AdapterRemoval v2.3.3

Updated Catch2 to fix compilation with glibc 2.34, courtesy of loganrosen.

Assets 2

17 Mar 11:26

MikkelSchubert

v2.3.2

e25a9af

AdapterRemoval v2.3.2

Improved error messages when AdapterRemoval failed to open or write FASTQ
files (issue #42).
Fixed build on some architectures. Patch courtesy of Andreas Tille/the Debian
build team.
Fixed display of max Phred scores in FASTQ validation error messages.
Removed benchmarking scripts which were included in the repo for the sake of
making Schubert et al. 2016 reproducible. This is no longer relevant.
Use 'install' in the Makefile; patch courtesy of Eric DEVEAUD.
Added --collapse-deterministic to .settings file.
Fixed --minadapteroverlap being misapplied in PE mode.
Added --collapse-conservatively merge algorithm based on FASTQ-join. See
the man-page for more information

Assets 2

12 Oct 20:08

MikkelSchubert

v2.3.1

95a9fd1

AdapterRemoval v2.3.1

Added --preserve5p option. This option prevents AdapterRemoval from trimming
the 5p of reads when the --trimqualities, --trimns, and --trimwindows options
are used. Neither end of collapsed reads are trimmed when this option is used.
Fixed Ns being miscounted as As when constructing consensus adapter sequences
using --identify-adapters.

Assets 2

12 Mar 17:06

MikkelSchubert

v2.3.0

ab3d026

AdapterRemoval v2.3.0

Fixed --collapse producing slightly different result on 32 bit and 64 bit
architectures. Courtesy of Andreas Tille.
Added support for output files without a basename; to create such output
files, use an empty basename (--basename "") or a basename ending with a
slash (--basename path/).
Added support for managing file handles to allow AdapterRemoval to run
when the the number of output files exceeds the number of file handles, e.g.
when demultiplexing large numbers of samples.
Reworked demultiplexing to improve performance for many paired barcodes.

Assets 2

10 Feb 16:56

MikkelSchubert

v2.2.4

3e33b8b

AdapterRemoval v2.2.4

Fixed bug in --trim5p N which would AdapterRemoval to abort if N was greater
than the pre-trimmed read length.
Fixed --identify-adapters not respecting the --mate-separator option.

Assets 2

22 Jan 21:41

MikkelSchubert

v2.2.3

f3a45c3

AdapterRemoval v2.2.3

Added support for trimming reads by a fixed amount: --trim5p N --trim3p N.
Different values may be given for each mate: --trim5p N1 N2. Trimming is
carried out after adapters have been removed and reads have been collapsed,
if enabled, but before quality trimming (Ns and low qualities).
Added option for determistic read merging (--collapse-deterministic). In
this mode AdapterRemoval will set a merged base to 'N' with quality 0 if
the corresponding bases on the two mates differ, and if both have the same
quality score. The default behavior is to select one of the two bases at
random.
Fixed reporting of line numbers in error messages.
Added conda installation instructions, courtesy of Maxime Borry (maxibor).
Fixed reading mate 2 adapters specified via --adapter-list. Adapters would
be used in the reverse orientation compared to --adapter2. Courtesy of
Karolis (KarolisM).
Fixed various typos and improved help/error messages.

Assets 2

Releases: MikkelSchubert/adapterremoval

AdapterRemoval v3.0.0-alpha3

Added

Changed

Fixed

Uh oh!

AdapterRemoval v2.3.4

Added

Fixed

Changed

Uh oh!

AdapterRemoval v3.0.0-alpha2

Added

Changed

Removed

Uh oh!

AdapterRemoval v3.0.0-alpha1

Added

Changed

Removed

Uh oh!

AdapterRemoval v2.3.3

Uh oh!

AdapterRemoval v2.3.2

Uh oh!

AdapterRemoval v2.3.1

Uh oh!

AdapterRemoval v2.3.0

Uh oh!

AdapterRemoval v2.2.4

Uh oh!

AdapterRemoval v2.2.3

Uh oh!