Mismatch Between adapter_trimmed_reads and the Number of split-by-adapter Sequences

Hello,

I encountered a problem when performing quality control and adapter trimming using **fastplong**.  
The JSON report indicates that 737 reads contain adapter sequences, but when I count the reads by their names after QC, the number reaches 193,512.

Here is the command I used:

```
fastplong -l 20 -q 7 -w 6 -i test.fastq.gz -o test.filter.fq.gz -j test.fastplong.json -s ATCATGCGAGGGCTAATTGTATATCACC
```

Below is the relevant part of the JSON report:

```json
{
	"summary": {
		"fastplong_version": "0.3.0",
		"before_filtering": {
			"total_reads":743644,
			"total_bases":10389662359,
			"q20_bases":7603655307,
			"q30_bases":2685202180,
			"q20_rate":0.731848,
			"q30_rate":0.258449,
			"read_mean_length":13971,
			"gc_content":0.367994
		},
		"after_filtering": {
			"total_reads":745767,
			"total_bases":10361933317,
			"q20_bases":7584624643,
			"q30_bases":2678339372,
			"q20_rate":0.73197,
			"q30_rate":0.258479,
			"read_mean_length":13894,
			"gc_content":0.367975
		}
	},
	"filtering_result": {
		"passed_filter_reads": 745767,
		"low_quality_reads": 963,
		"too_many_N_reads": 0,
		"too_short_reads": 795,
		"too_long_reads": 0
	},
	"adapter_cutting": {
		"adapter_trimmed_reads": 737,
		"adapter_trimmed_bases": 92832,
		"read_start_adapter": "ATCATGCGAGGGCTAATTGTATATCACC",
		"read_end_adapter": "GGTGATATACAATTAGCCCTCGCATGAT",
		"read_adapter_counts": {"CTAATTGTATATCACC":17, "GGTGATATACAATTAGC":8, "GGTGATATACAATTAGCC":16, "GGTGATATACAATTAGCCCTCGCA":10, "GGTGATATACAATTAGCCCTCGCATG":8, "ATCATGCGAGGGCTAATTGTATATCACC":456, "GGTGATATACAATTAGCCCTCGCATGAT":187, "others":35}
	},
}
```

After QC, my statistics on the output FASTQ show:

1. **193,512 reads** whose names contain the tag `split-by-adapter`  .
2. Among them, **4,015 reads** were split into **both** left and right parts  .
3. The rest are single‐side splits (only left or only right).

Example:

**Original read**  
ID: `111_65_3798_1359_574291794_77069_2_13.46`  
Length: **6093 bp**

**After QC**  
ID: split-by-adapter-left-111_65_3798_1359_574291794_77069_2_13.46
Length: **5340 bp**
ID: split-by-adapter-right-111_65_3798_1359_574291794_77069_2_13.46
Length:**705 bp**

---

### My Questions

1. Why does the JSON report only 737 adapter_trimmed_reads, while the output FASTQ contains 193,512 split-by-adapter sequences? Could this discrepancy be caused by other parameters or by a different mechanism in fastplong?
2. Some reads in my dataset contain adapter sequences at both ends and are therefore split into two fragments. Is there any way to optimize the handling of such “dual-end adapter” cases?


Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch Between adapter_trimmed_reads and the Number of split-by-adapter Sequences #36

My Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mismatch Between adapter_trimmed_reads and the Number of split-by-adapter Sequences #36

Description

My Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions