-
Notifications
You must be signed in to change notification settings - Fork 6
Numerous undetected adaptors in B,C,D cats #11
Description
Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:
- Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
- There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
- I have not analysed read pairs yet.
I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.