|
2 | 2 |
|
3 | 3 | ## [3.0.0-alpha2] - 2024-08-20 |
4 | 4 |
|
5 | | -This is the second alpha release of AdapterRemoval v3. It is the intention that a third alpha release, or the final 3.0 release, will follow within the next couple of months. |
| 5 | +This is the second alpha release of AdapterRemoval v3. It is the intention that |
| 6 | +a third alpha release, or the final 3.0 release, will follow within the next |
| 7 | +couple of months. |
6 | 8 |
|
7 | | -In addition to changes listed below, this release includes increased throughput thanks to improved parallelization of various steps in internal pipeline, support for AVX512 and general improvements to the SIMD alignment algorithms, loop unrolling of non-SIMD alignments to significantly increase throughput when SIMD is not available, and a significant decrease in the number of allocations to decrease overhead. |
| 9 | +As with alpha 1, changes that affect how AdapterRemoval is used (e.g. by |
| 10 | +removing options) or that result in different output compared to AdapterRemoval |
| 11 | +v2 are marked with the label "[**BREAKING**]". |
8 | 12 |
|
9 | | -This release requires a compiler with support for c++17 and libdeflate is now a mandatory dependency. |
| 13 | +In addition to changes listed below, this release includes increased throughput |
| 14 | +thanks to improved parallelization of various steps in internal pipeline, |
| 15 | +support for AVX512 and general improvements to the SIMD alignment algorithms, |
| 16 | +loop unrolling of non-SIMD alignments to significantly increase throughput when |
| 17 | +SIMD is not available, and a significant decrease in the number of allocations |
| 18 | +to decrease overhead. |
| 19 | + |
| 20 | +This release requires a compiler with support for c++17 and libdeflate is now a |
| 21 | +mandatory dependency. |
10 | 22 |
|
11 | 23 | ### Added |
12 | 24 |
|
13 | | -- Added support for converting (U)racils in input data to T(hymine) via the `--convert-uracils` flag. |
14 | | -- Added support for replacing IUPAC-encoded degenerate bases with Ns via the `--mask-degenerate-bases` flag. |
15 | | -- Added support for writing output in SAM/BAM formats, with optional user-supplied read-group information. |
16 | | -- Added support for alignments using AVX512 instructions. AVX512 support only available when AdapterRemoval is compiled with GCC v11+ or Clang v8+. |
17 | | -- Added support selecting output file formats via the file extension and via the `--out-format` option. A corresponding option, `--stdout-format` was added to select the format for data written to STDOUT. |
18 | | -- Added support for reading from STDIN or writing to STDOUT when '-' is used as the filename, as an alternative to using `/dev/stdin` or `/dev/stdout`. |
19 | | -- Added dedicated threads solely for writing output data. This allows compute threads to work at full capacity, as long as the destination can consume written data fast enough. This may result in CPU utilization exceeding `--threads` by a couple of percent. |
| 25 | +- Added support for converting (U)racils in input data to T(hymine) via the |
| 26 | + `--convert-uracils` flag. |
| 27 | +- Added support for replacing IUPAC-encoded degenerate bases with Ns via the |
| 28 | + `--mask-degenerate-bases` flag. |
| 29 | +- Added support for writing output in SAM/BAM formats, with optional |
| 30 | + user-supplied read-group information. |
| 31 | +- Added support for alignments using AVX512 instructions. AVX512 support only |
| 32 | + available when AdapterRemoval is compiled with GCC v11+ or Clang v8+. |
| 33 | +- Added support selecting output file formats via the file extension and via |
| 34 | + the `--out-format` option. A corresponding option, `--stdout-format` was |
| 35 | + added to select the format for data written to STDOUT. |
| 36 | +- Added support for reading from STDIN or writing to STDOUT when '-' is used as |
| 37 | + the filename, as an alternative to using `/dev/stdin` or `/dev/stdout`. |
| 38 | +- Added dedicated threads solely for writing output data. This allows compute |
| 39 | + threads to work at full capacity, as long as the destination can consume |
| 40 | + written data fast enough. This may result in CPU utilization exceeding |
| 41 | + `--threads` by a couple of percent. |
20 | 42 | - Added support for setting DESTDIR when running `make install`. |
21 | | -- Added `--licenses` flag for displaying licenses of 3rd party code used by/incorporated into AdapterRemoval. |
22 | | -- Added `--simd` option allowing the user to select the specific SIMD instruction set they wish to use. |
| 43 | +- Added `--licenses` flag for displaying licenses of 3rd party code used by / |
| 44 | + incorporated into AdapterRemoval. |
| 45 | +- Added `--simd` option allowing the user to select the specific SIMD |
| 46 | + instruction set they wish to use. |
23 | 47 | - Added `Containerfile` for building static binaries using alpine/musl. |
24 | 48 |
|
25 | 49 | ### Changed |
26 | 50 |
|
27 | | -- [**BREAKING**] Changed the default `--mm`/`--mismatch-rate` from 1/3 to 1/6, in order to decrease the false positive rate, in particular for read merging. |
28 | | -- [**BREAKING**] Default to writing gzip-compressed FASTQ files; output written to STDOUT is uncompressed by default. |
| 51 | +- [**BREAKING**] Changed the default `--mm`/`--mismatch-rate` from 1/3 to 1/6, |
| 52 | + in order to decrease the false positive rate, in particular for read merging. |
| 53 | +- [**BREAKING**] Default to writing gzip-compressed FASTQ files; output written |
| 54 | + to STDOUT is uncompressed by default. |
29 | 55 | - [**BREAKING**] Discarded reads are no longer saved by default. |
30 | | -- [**BREAKING**] Output files for discarded reads and singleton (orphan) paired-end reads are only created if filtering is enabled. |
31 | | -- [**BREAKING**] The `--basename` / `--out-prefix` no longer defaults to `your_output`. Instead the user is required to set at least one `--out-*` option. |
32 | | -- [**BREAKING**] Merged `--identify-adapters` and `--report-only` commands. The adapter sequence is presently only reported in the HTML report, but will be added to the JSON report following some planned changes. |
| 56 | +- [**BREAKING**] Output files for discarded reads and singleton (orphan) |
| 57 | + paired-end reads are only created if filtering is enabled. |
| 58 | +- [**BREAKING**] The `--basename` / `--out-prefix` no longer defaults to |
| 59 | + `your_output`. Instead the user is required to set at least one `--out-*` |
| 60 | + option. |
| 61 | +- [**BREAKING**] Merged `--identify-adapters` and `--report-only` commands. The |
| 62 | + adapter sequence is presently only reported in the HTML report, but will be |
| 63 | + added to the JSON report following some planned changes. |
33 | 64 | - [**BREAKING**] Reverted `--min-complexity` being enabled by default. |
34 | 65 | - Increased the default ``--threads`` value to 2. |
35 | | -- A number of command-line options were renamed for consistency; use of the old names is still supported, but will trigger a warning message. |
36 | | -- Re-organized compression: level 1 is streamed using isa-l, while levels 2-13 correspond to libdeflate levels 1 to 12. |
37 | | -- Changed the default compression level to 5 on the new scale (libdeflate level 4); this results in a ~40% increase in throughput at the cost of roughly ~3% larger output files. |
38 | | -- Setting an `--out-*` option in demultiplexing mode overrides the basename/prefix for that specific output type. |
39 | | -- Add smoothing to GC values calculated for the GC content curve, to account for the fact that possible GC% values are unevenly distributed depending on the read length. |
| 66 | +- A number of command-line options were renamed for consistency; use of the old |
| 67 | + names is still supported, but will trigger a warning message. |
| 68 | +- Re-organized compression: level 1 is streamed using isa-l, while levels 2-13 |
| 69 | + correspond to libdeflate levels 1 to 12. |
| 70 | +- Changed the default compression level to 5 on the new scale (libdeflate level |
| 71 | + 4); this results in a ~40% increase in throughput at the cost of roughly ~3% |
| 72 | + larger output files. |
| 73 | +- Setting an `--out-*` option in demultiplexing mode overrides the basename / |
| 74 | + prefix for that specific output type. |
| 75 | +- Add smoothing to GC values calculated for the GC content curve, to account |
| 76 | + for the fact that possible GC% values are unevenly distributed depending on |
| 77 | + the read length. |
40 | 78 |
|
41 | 79 | ### Removed |
42 | 80 |
|
43 | 81 | The following changes are all [**BREAKING**] as described above: |
44 | 82 |
|
45 | | -- Removed support for original merging algorithm has been removed. The `--merge-strategy additive` method produces very similar, but slightly more conservative scores. |
46 | | -- Removed the ability to randomly sample a base if no best base could be selected in case of mismatches. Such bases are now changed to `N`, while both methods assign a Phred score of 0 (`!`). |
| 83 | +- Removed support for original merging algorithm has been removed. The |
| 84 | + `--merge-strategy additive` method produces very similar, but slightly more |
| 85 | + conservative scores. |
| 86 | +- Removed the ability to randomly sample a base if no best base could be |
| 87 | + selected in case of mismatches. Such bases are now changed to `N`, while both |
| 88 | + methods assign a Phred score of 0 (`!`). |
47 | 89 |
|
48 | 90 | ## [3.0.0-alpha1] - 2022-11-07 |
49 | 91 |
|
|
0 commit comments