Skip to content

AdapterRemoval v3 Feedback #58

@jfy133

Description

@jfy133

Hi @MikkelSchubert I decided to make a dedicated issue for general feedback of AdapterRemoval v3 testing, as I may find other points to discuss:

Version v3.0.0pre 344591c

  • Leaving in single reads with Ns

        --combined-output  
            If set, all reads are written to the same file(s), specified by
            --output1 and --output2 (--output1 only if --interleaved-output is
            not set). Discarded reads are replaced with a single 'N' with Phred
            score 0 [default: off].
    

    While I used to do this, @ashildv recently was informed by the ENA that include 'discarded reads' with a single 'N' will not
    be accepted by their pipeline (it breaks, and the data gets rejected). Maybe it would be worth having e.g. 5 Ns or
    something (or remove them entirely)?
    <- I realise could just do the custom output instead and make sure discarded goes in a separate file

  • --singleton flag: would it make sense for consistency to have --outputsingleton as the other output flags (1,2,merged) start with --output?

  • --settings FILE: could maybe be renamed, as the bulk of the contents of the JSON is stats rather than the settings itself

  • json output:

    • it would be nice for this to also include the physical number of entries that are in the resulting output files when also merged (as a separate value), sort of equivalent to retained reads in v2.3.2. Currently the JSON only reports the number of output (passed) reads as it would be if everything was unmerged. So something like in addition to the passed, discarded and unidentified sections of the output JSON, having something like in_files or output_file would be nice to have as it helps match the expectation of a (unfamiliar) end-user between the file itself and the JSON report. However I recognise that this could be complicated given the very flexible output system now.
    • It would nice to have some documentation for what each value means. I've tried playing around but I still can't work out how the various reads entry in the JSON relate to each other as what is in the final output FASTQ files

initial tests completed most of the above are more quality-of-life issues, otherwise everything is working as expected 👍

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions