-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Hello,
I encountered a problem when performing quality control and adapter trimming using fastplong.
The JSON report indicates that 737 reads contain adapter sequences, but when I count the reads by their names after QC, the number reaches 193,512.
Here is the command I used:
fastplong -l 20 -q 7 -w 6 -i test.fastq.gz -o test.filter.fq.gz -j test.fastplong.json -s ATCATGCGAGGGCTAATTGTATATCACC
Below is the relevant part of the JSON report:
{
"summary": {
"fastplong_version": "0.3.0",
"before_filtering": {
"total_reads":743644,
"total_bases":10389662359,
"q20_bases":7603655307,
"q30_bases":2685202180,
"q20_rate":0.731848,
"q30_rate":0.258449,
"read_mean_length":13971,
"gc_content":0.367994
},
"after_filtering": {
"total_reads":745767,
"total_bases":10361933317,
"q20_bases":7584624643,
"q30_bases":2678339372,
"q20_rate":0.73197,
"q30_rate":0.258479,
"read_mean_length":13894,
"gc_content":0.367975
}
},
"filtering_result": {
"passed_filter_reads": 745767,
"low_quality_reads": 963,
"too_many_N_reads": 0,
"too_short_reads": 795,
"too_long_reads": 0
},
"adapter_cutting": {
"adapter_trimmed_reads": 737,
"adapter_trimmed_bases": 92832,
"read_start_adapter": "ATCATGCGAGGGCTAATTGTATATCACC",
"read_end_adapter": "GGTGATATACAATTAGCCCTCGCATGAT",
"read_adapter_counts": {"CTAATTGTATATCACC":17, "GGTGATATACAATTAGC":8, "GGTGATATACAATTAGCC":16, "GGTGATATACAATTAGCCCTCGCA":10, "GGTGATATACAATTAGCCCTCGCATG":8, "ATCATGCGAGGGCTAATTGTATATCACC":456, "GGTGATATACAATTAGCCCTCGCATGAT":187, "others":35}
},
}After QC, my statistics on the output FASTQ show:
- 193,512 reads whose names contain the tag
split-by-adapter. - Among them, 4,015 reads were split into both left and right parts .
- The rest are single‐side splits (only left or only right).
Example:
Original read
ID: 111_65_3798_1359_574291794_77069_2_13.46
Length: 6093 bp
After QC
ID: split-by-adapter-left-111_65_3798_1359_574291794_77069_2_13.46
Length: 5340 bp
ID: split-by-adapter-right-111_65_3798_1359_574291794_77069_2_13.46
Length:705 bp
My Questions
- Why does the JSON report only 737 adapter_trimmed_reads, while the output FASTQ contains 193,512 split-by-adapter sequences? Could this discrepancy be caused by other parameters or by a different mechanism in fastplong?
- Some reads in my dataset contain adapter sequences at both ends and are therefore split into two fragments. Is there any way to optimize the handling of such “dual-end adapter” cases?
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels