Skip to content

Commit bca19b1

Browse files
update documentation (#142)
The documentation is still a work in progress, but this moves it significantly closer to being ready for the final release of v3.
1 parent 2a630a4 commit bca19b1

File tree

12 files changed

+1244
-356
lines changed

12 files changed

+1244
-356
lines changed

CHANGES.md

Lines changed: 111 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,109 @@
11
# Changelog
22

3+
## [3.0.0-alpha3] - 2025-05-19
4+
5+
This is the third alpha release of AdapterRemoval v3. As with the previous alpha
6+
releases, changes that affect how AdapterRemoval is used (e.g. by removing
7+
options) or that result in different output compared to previous versions are
8+
marked with the label "[**BREAKING**]".
9+
10+
AdapterRemoval now uses `meson` for its build process, and `meson` is therefore
11+
a build-time requirement. A `Makefile` is still provided to simplify setting up
12+
and running the build. See the installation instructions in the documentation
13+
for more information.
14+
15+
Major changes include support for hardware accelerated alignments using NEON on
16+
modern Apple hardware, support for samples being identified by multiple
17+
barcodes, support for handling barcodes in that may ligate in different
18+
orientations, improved support for SAM/BAM output, and (optional) duplication
19+
plot in HTML report.
20+
21+
### Added
22+
23+
- Multiple barcodes/barcode pairs may now be used to identify the same sample,
24+
via the `--multiple-barcodes` flag. The number of hits per barcode/barcode
25+
pair is reported in the HTML/JSON reports.
26+
- Added support for handling barcodes that may ligate in different orientations
27+
(via `--barcode-orientation`) and for normalizing the orientation of merged
28+
reads (via `--normalize-orientation`).
29+
- The `--use-colors` parameter may now be used to controls color output.
30+
Options are auto (default; enabled when run interactively), always, or never.
31+
- The title of the HTML report can now be set via `--report-title`.
32+
- Input files are now checked for duplicate filenames, in order to help
33+
prevents accidental data duplication.
34+
- Alignments are now accelerated on Apple hardware using NEON instructions, for
35+
a roughly 3-fold increase in throughput.
36+
- A duplication plot is now included in the HTML report if this is enabled,
37+
instead of only being reported in the JSON file.
38+
39+
### Changed
40+
41+
- [**BREAKING**] Changed `CO` tags for read-groups in SAM/BAM files to `DS`
42+
(description) tags, in order to match the specification.
43+
- [**BREAKING**] A number of changes have been made to the JSON report layout,
44+
including the moving, removal, and addition of sections. The layout is
45+
described in `schema.json`.
46+
- [**BREAKING**] The minimum allowed/default value for `--min-adapter-overlap`
47+
was set to 1. In practice this has no effect, since length 0 alignments were
48+
never considered, but may break scripts running AdapterRemoval.
49+
- [**BREAKING**] Drop support for raw error-rates to `--trim-mott-rate`, which
50+
was renamed to `--trim-mott-quality` to match other trimming options.
51+
- [**BREAKING**] SAM/BAM output is now combined into a single file by default,
52+
including discarded reads. This can be overridden by setting the individual
53+
`--out-*` options.
54+
- [**BREAKING**] Dropped `PG` tag from read-groups/records in SAM/BAM output.
55+
- [**BREAKING**] Dropped (minimal) read-groups for SAM/BAM output. If desired,
56+
read-group information can be added with `--read-group`.
57+
- [**BREAKING**] The `--report-duplication` option now supports k/m/g suffixes,
58+
and defaults to `100k` if used without an explicit value.
59+
- [**BREAKING**] The `--read-group` option no longer attempts to unescape
60+
special characters. Instead, tags must be separated using embedded tabs
61+
(`--read-group $'ID:A\tSM:B'`) or provided as individual arguments
62+
(`--read-group 'ID:A' 'SM:B'`).
63+
- Improved checks for conflicting command-line options.
64+
- Barcodes are now recorded in FASTQ headers demultiplexing without trimming.
65+
- The `$schema` URL is now included in the JSON report
66+
- Makefile features are now enabled/disable with `true`/`false` instead of
67+
`yes`/`no`.
68+
- Vega-lite is now loaded in the background, when opening the HTML reports,
69+
making the report readable before Vega-lite has loaded.
70+
- Optimized alignments involving multiple possible adapter sequences, by only
71+
once performing the alignments that involve no adapter sequences.
72+
- Optimized alignments involving multiple possible adapter sequences, by
73+
sorting the list of adapter sequences by hits. This increasing the odds that
74+
a good alignment is found early so that worse alignments can be skipped.
75+
- The old Makefile was replaced with the Meson build system, but a wrapper
76+
Makefile is still provided/used as a convenience for setting the recommended
77+
build options.
78+
- A number of small improvements were made to the `--help` text.
79+
- Improved error messages when mismatching (paired) read names are detected.
80+
- Singleton reads are now included in the overall summary statistics in
81+
JSON/HTML reports.
82+
- Hardening flags are now enabled by default during compilation. This comes with
83+
a small performance cost, but most distros are also expected to enable similar
84+
flags by default.
85+
86+
### Fixed
87+
88+
- NA values were being written with '%' or 'bp' suffixes in HTML report.
89+
- Some plots were omitted for merged reads in HTML report.
90+
- Mate 2 adapters were reverse-complemented in JSON report when demultiplexing.
91+
- SAM/BAM headers were not being written in demultiplexing mode.
92+
- Mate 1/2 statistics were sampled independently, and thus potentially not
93+
derived from the same read pairs.
94+
- The JSON/HTML reports would give different time-stamps for the run, since one
95+
gave the start time and the other the end time. Now start time is always used.
96+
- Fixed failure when reading paired FASTQ files where read lengths differed
97+
between the two files
98+
- Fixed report files and unidentified reads getting additional suffixes when
99+
filenames were specified manually during demultiplexing.
100+
- Fixed `/dev/null` being listed as the path for some files when demultiplexing,
101+
and these outputs were disabled.
102+
- Reverted the removal of support for '.' as equivalent to 'N' in FASTQ reads.
103+
This is found in some older data-sets (#112).
104+
- Fixed misleading IO error messages, that would include descriptions of
105+
unrelated errors in some cases.
106+
3107
## [2.3.4] - 2024-08-24
4108

5109
This release adds a new couple of command-line options for handling non-ACGTN
@@ -151,6 +255,8 @@ Feedback is very welcome in the mean time.
151255
`--gzip` option when manually specifying output files.
152256
- Added options `--prefix-read1`, `--prefix-read2`, and `--prefix-merged` for
153257
adding custom prefixes to the names of FASTQ reads.
258+
- Added support for trimming poly-X tails. Trimming can be done for a single
259+
nucleotide (e.g. poly-A) or for any combination of A, C, G, and T tails.
154260

155261
### Changed
156262

@@ -166,9 +272,9 @@ Feedback is very welcome in the mean time.
166272
instead of `Q_match ~= Q_a + Q_b`, and that same-quality mismatches are
167273
assigned 'N' instead of one being picked at random. Motivated in part by
168274
`doi:10.1186/s12859-018-2579-2`. This can be changed using `--merge-strategy`.
169-
- The `--merge` option no longer has any effect when processing SE data;
170-
previously this option would treat reads with at `--minalignmentlength`
171-
adapter as pseudo-merged reads.
275+
- [**BREAKING**] The `--merge` option no longer has any effect when processing
276+
SE data; previously this option would treat reads with at
277+
`--minalignmentlength` adapter as pseudo-merged reads.
172278
- [**BREAKING**] Merged reads are no longer given a `M_` name prefix and merged
173279
reads that have been trimmed after merging are no longer given an `MT_` name
174280
prefix. Instead, see the new option `--prefix-merged`.
@@ -690,7 +796,7 @@ dramatic effects on the use of the program so please read these notes carefully
690796
### Changed
691797

692798
- Updated trimming of qualities.
693-
- The programs handles lower vs upper case issues by translating all sequences
799+
- The program handles lower vs upper case issues by translating all sequences
694800
to upper case.
695801
- The program now checks for inconsistent parameters.
696802

@@ -702,6 +808,7 @@ dramatic effects on the use of the program so please read these notes carefully
702808

703809
- Initial release
704810

811+
[3.0.0-alpha3]: https://github.com/MikkelSchubert/adapterremoval/compare/3.0.0-alpha2...3.0.0-alpha3
705812
[2.3.4]: https://github.com/MikkelSchubert/adapterremoval/compare/v2.3.3...v2.3.4
706813
[3.0.0-alpha2]: https://github.com/MikkelSchubert/adapterremoval/compare/3.0.0-alpha1...3.0.0-alpha2
707814
[3.0.0-alpha1]: https://github.com/MikkelSchubert/adapterremoval/compare/v2.3.3...3.0.0-alpha1

README-v3.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# AdapterRemoval v3 - Dang fast(Q) processing
2+
3+
[![build](https://github.com/MikkelSchubert/adapterremoval/actions/workflows/build-and-test.yaml/badge.svg)](https://github.com/MikkelSchubert/adapterremoval/actions/workflows/build-and-test.yml) [![docs](https://readthedocs.org/projects/adapterremoval/badge/?version=latest)](https://adapterremoval.readthedocs.io/) [![coverage](https://coveralls.io/repos/github/MikkelSchubert/adapterremoval/badge.svg?branch=master)](https://coveralls.io/github/MikkelSchubert/adapterremoval)
4+
5+
AdapterRemoval trims adapter sequences and low quality bases from High-Throughput Sequencing (HTS) data in FASTQ format. For paired-end data, AdapterRemoval can merge overlapping paired-ended reads into (longer) consensus sequences. Additionally, AdapterRemoval can demultiplex FASTQ reads, and construct a consensus adapter sequence for paired-ended reads, if this information is not available.
6+
7+
AdapterRemoval v3 is a major revision of AdapterRemoval v2 that aims at simplifying usage via sensible default settings. In addition, v3 adds support for detailed QC reports, trimming of poly-X tails and improved trimming of low quality bases, and greatly increased throughput. See below for more information.
8+
9+
AdapterRemoval v3 is still a work in progress, but alpha release 3 is [available for download](https://github.com/MikkelSchubert/adapterremoval/releases/tag/v3.0.0-alpha3/). Documentation is available at [Read the Docs](https://adapterremoval.readthedocs.io/en/v3.0.0-alpha3/), including a guide on how to migrate from v2. For questions, bug reports, and/or suggestions, please use the [GitHub tracker](https://github.com/MikkelSchubert/adapterremoval/issues/).
10+
11+
## Major features
12+
13+
- Trimming of adapters sequences from single-end and paired-end FASTQ reads
14+
- Trimming of multiple, different adapters or adapter pairs
15+
- Detailed [human-readable](https://mikkelschubert.github.io/adapterremoval/examples/3.0.0-alpha3.html) and [machine-readable](https://mikkelschubert.github.io/adapterremoval/examples/3.0.0-alpha3.json) QC reports **(v3)**
16+
- The ability to perform QC-only runs with or without read processing **(v3)**
17+
- Barcode based demultiplexing with or without trimming of adapter sequences
18+
- Support for samples identified by multiple barcode pairs **(v3)**
19+
- Support for mixed orientation barcodes **(v3)**
20+
- Support for multiple methods for trimming low quality bases/reads
21+
- Quality trimming using windows or constants thresholds
22+
- Quality trimming using the modified Mott algorithm **(v3)**
23+
- Poly-X tail trimming, supporting any combination of trailing bases **(v3)**
24+
- Filtering of reads based on complexity **(v3)**
25+
- Reconstruction of adapter sequences by pair-wise alignment of paired-end reads
26+
- Merging of overlapping read-pairs into higher-quality consensus sequences
27+
- Support for reading interleaved FASTQ files
28+
- Support for arbitrary splitting/interleaving of output files **(v3)**
29+
- Support for writing BGZF compressed SAM and BAM files **(v3)**
30+
- Support for SSE2, AVX2, AVX512, and NEON accelerated alignments **(v3)**
31+
32+
## Performance
33+
34+
AdapterRemoval v3 features greatly increased throughput compared to AdapterRemoval v2. This is accomplished through support for additional SIMD instruction sets, improved parallelization of I/O and computationally expensive tasks, including block based compression of output files, as well as defaulting to a lower compression level.
35+
36+
Please note that these results are preliminary:
37+
38+
![Throughput for ARv2, ARv3, and fastp](https://raw.githubusercontent.com/MikkelSchubert/adapterremoval/master/docs/images/throughput.svg)
39+
40+
Point labels indicate the number of worker threads configured for each program, while the X-axis indicates observed CPU-usage for a given number of worker threads. The Y-axis indicates millions of 150bp paired-end reads processed per second, for gzipped input and output, with merging enabled and duplication estimation disabled.
41+
42+
Benchmarking was performed on an Intel i9-11900K with 8 physical cores, and plotting is therefore limited to ~8 CPUs. The CPU usage of fastp being higher than the number of worker threads, is due to additional threads being used (compressed) I/O. ARv2 does not scale past 4 threads.
43+
44+
## Documentation
45+
46+
For a detailed description of program installation and usage, please refer to the [online documentation](https://adapterremoval.readthedocs.io/en/latest). A summary of command-line options may also be found in the [manual page](https://adapterremoval.readthedocs.io/en/latest/manpage.html), accessible via the command `man adapterremoval3` once AdapterRemoval has been installed.
47+
48+
## Citation
49+
50+
If you use AdapterRemoval v3, then please cite the paper::
51+
52+
Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 12;9(1):88 <http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2>
53+
54+
AdapterRemoval was originally published in Lindgreen 2012:
55+
56+
Lindgreen (2012): AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads, BMC Research Notes, 5:337 <http://www.biomedcentral.com/1756-0500/5/337/>

0 commit comments

Comments
 (0)