Skip to content

Sample Processing

Stephen Bolaris edited this page Jun 1, 2022 · 5 revisions

UMIs and deduplication

What do you need....

When included in your sample prep the UMI will reside in the R2 reads and will be handled by the debarcoder. If you only did single end (SE) sequencing you will not have a UMI and will not be able to deduplicate reads.

how does it work?

The deduplication occurs by the debarcoder first identifying the UMI based on the position in the read and labeling it by using the XU tag for downstream processing. After the alignment the deduplication works be finding a sequence and UMI and building a graph with corrections for potential sequencing errors in the UMI, and collapses the graph to remove those reads determined to be duplicates.

What if I see an error that says invalid UMI?

This can occur when someone runs sequoia complete data sets in express toolkit, or if you forgot to add the UMI in your sample prep.

Clone this wiki locally