Skip to content

Sample Processing

Stephen Bolaris edited this page Jun 2, 2022 · 5 revisions

UMIs and deduplication

What do you need....

When included in your sample prep the UMI will reside in the R2 reads and will be handled by the debarcoder. If you only did single end (SE) sequencing you will not have a UMI and will not be able to deduplicate reads.

how does it work?

The deduplication occurs by the debarcoder first identifying the UMI based on the position in the read and labeling it by using the XU tag for downstream processing. After the alignment the deduplication works be finding a sequence and UMI and building a graph with corrections for potential sequencing errors in the UMI, and collapses the graph to remove those reads determined to be duplicates.

BAM tags

The process of UMI tagging is done at the BAM file level in the process UMI tagging as part of the deduplication. Where the UMI is added to this tag. There are other tags that can be use and have a full list from samtools

What if I see an error that says invalid UMI?

This can occur when someone runs sequoia complete data sets in express toolkit, or if you forgot to add the UMI in your sample prep.

Trimming

There are default trimming quality cut offs. These are not set in stone but they are good starting point. If you have your own cutoffs then feel free to update them, or if you are not sure and would like more strict about the quality use a first pass. After running the toolkit for the first time, you will have the fastQC output that will show the quality of the reads and allow for more informative trimming if low quality reads are present.

Filtering of read counts

The SEQuoia Express toolkit has an option to allow users to filter the reads based on a threshold. the option to do this is two parts:

  1. minGeneType = "none" : this can be ["none","reads","RPKM","TPM"]
  2. minGeneCutoff = 0 : threshold you want to use The results of this filtering are not in the report folder, instead they are put in output/SampleFiles/sample_name/RNACounts

Clone this wiki locally