Skip to content

Support mixed dual/single indexed files in AssignReadGroupByBarcode #512

@magicDGS

Description

@magicDGS

In #509, @robmaz report that he has some data barcoded with mixed dual/single indexes. He attached the FASTQ files with some reads and the full table with the information from barcoding (viola-452.txt).

I found that the mixed files are kind of an interesting CASAVA-like formatted FASTQs: for representing dual indexed file, they join the sequences with the + sign (ReadTools, by default, uses - as a separator as recommended by the SAM-specs). For handling this formatted file, there are actually several options with the current implementation:

  • For matching each index in the dual-barcoded samples independently: use the advance option to set a different delimiter (java property -Dreadtools.barcode_index_delimiter=+), and assign to the second barcode for the single indexed files a single N. This will count as a mistmatch (unless --nNoMismatch is specified) and thus it could cause problems with detection (e.g., for 0-mismatches allowed, will never get detected).
  • For matching both indexes in the dual-barcode together: in the barcode file, the dual-indexed samples should contain both indexes separated by + in the barcode-header (e.g., ACTG+GGTC) and the single indexed only its unique one (e.g., AGGC). This allows to play with parameters that can cause problems in the previous setup, but the dual-indexed samples might be also problematic to detect (e.g., the first barcode includes an extra base, ACTGT+GGTC, which will have lots of mismatches against the ACTG+GGTC as the extra T will be evaluated against the +)

As both approaches have their inconveniences, it will be nice to have a way to support mixed dual/single indexed files. This could be done after refactoring the barcode-detection classes (#113), as a new feature. Also, it shows that we should also add an argument to support other delimiters in barcode to convert to the standard (but maintaining the advance java property to set which one is the standard, as the SAM-specs only recommends the hyphen as separator.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions