Skip to content

Numerous undetected adaptors in B,C,D cats #11

@MichaelFokinNZ

Description

@MichaelFokinNZ

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

  1. Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
  2. There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
  3. I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions