BWA Mem Sorting optimization by ignacio3437 · Pull Request #323 · Plant-Food-Research-Open/assemblyqc

ignacio3437 · 2026-02-12T01:39:12Z

I spent some time trying to improve the runtime of the assemblyqc hic steps but my changes did not end up speeding up anything significantly.

The pipeline was doing a samtools sort -n step during the hic.bam file creation. The name-sorted bam is only needed for the HICQC module, which only uses ~1M read pairs, so to speed things up I have turned off the name sorting and introduced a new module.

SAMTOOLS_SUBSAMPLE_SORT creates a new bam file of a subset of the hic.bam file that is 5% of the reads. This subset_hic.bam is then name-sorted and passed to HICQC.
The rest of the pipeline uses the full (not name-sorted) hic.bam

Unfortunately the time saved during the BWAMEM step does not pay off in the long run. Here is my test using the HYv4 dataset:

	Gallvp_Main_min	Iggy_fork_min	Gallvp_main_mem(GB)	Iggy_fork_mem(GB)
BWA_MEM	349	281	17.7	5
Samblaster	121	110	5.63	5.63
JuicerPre	66	108
SortSub		29		7

TOTAL TIME	536	528

PR checklist

…rch-Open/dev dev -> main: Version 3.0.0

…ilure

…rch-Open/patch/315 [Plant-Food-Research-OpenGH-315] Patched synteny crash due to Syri failure

…ore hicqc

Main to dev

ignacio3437 · 2026-02-12T01:50:39Z

#322

ignacio3437 · 2026-02-12T02:19:19Z

subworkflows/local/fq2hic.nf

+    // MODULE: SAMTOOLS_SUBSAMPLE_SORT
+    SAMTOOLS_SUBSAMPLE_SORT (
+        ch_bam,
+        0.05  // Sample 5% of reads


@GallVp : Can we turn this into a parameter? By default we can set it to 100% which would essentially skip the SAMTOOLS_SUBSAMPLE_SORT module.

We can turn this into a parameter, but I think we would need to add the logic here to skip the subsample step if the parameter = 100%.

As is, this ch_subsampled_sorted_bam is only passed to hicqc.

ignacio3437 · 2026-02-12T02:22:08Z

subworkflows/local/fq2hic.nf


    // SUBWORKFLOW: FASTQ_BWA_MEM_SAMBLASTER
-    val_sort_bam = true
+    val_sort_bam = false


@GallVp : If we skip SAMTOOLS_SUBSAMPLE_SORT, do we need to reenable this? Or was this completely unnecessary. I vaguely remember that I had to turn this on because some downstream tool failed. But maybe that was the old HiC workflow based on the run_visualiser script.

It is not strictly necessary, all the output files are produced correctly. But it seems like some tools speed up with a name sorted bam. Especially some of the JuicerPre steps.

I wonder what the speedup would be with a coordinate sorted bam which is the standard sort. I think it would speed some steps up even more.

The name sorted bam was only required for hicqc.

GallVp and others added 14 commits September 24, 2025 09:25

Merge pull request Plant-Food-Research-Open#311 from Plant-Food-Resea…

edbb153

…rch-Open/dev dev -> main: Version 3.0.0

[Plant-Food-Research-OpenGH-315] Patched synteny crash due to Syri fa…

2d9497d

…ilure

Added disk cleaner to GHA and fixed Plant-Food-Research-OpenGH-317

658f506

Fixed branch protection rule

d6b50a0

Now skipping all Plotsr combinations including and after the failed one

c640bc5

Merge pull request Plant-Food-Research-Open#316 from Plant-Food-Resea…

3857ee7

…rch-Open/patch/315 [Plant-Food-Research-OpenGH-315] Patched synteny crash due to Syri failure

Removing name sort for hic.bam, adding subsample and sorting step bef…

f4c1fba

…ore hicqc

linting

12bef92

forgot to pass new bam to hicqc

9f6b6af

CHANGELOG update and fixing file handling

f785be0

update docs

274a9e1

Changing subsampling to multithread and fixing parameter reading

146d5e7

linting

dc203c2

Merge pull request #2 from ignacio3437/main

ec62afb

Main to dev

ignacio3437 commented Feb 12, 2026

View reviewed changes

ignacio3437 requested a review from GallVp February 12, 2026 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BWA Mem Sorting optimization#323

BWA Mem Sorting optimization#323
ignacio3437 wants to merge 14 commits intoPlant-Food-Research-Open:devfrom
ignacio3437:dev

ignacio3437 commented Feb 12, 2026

Uh oh!

ignacio3437 commented Feb 12, 2026

Uh oh!

ignacio3437 Feb 12, 2026

Uh oh!

ignacio3437 Feb 12, 2026

Uh oh!

ignacio3437 Feb 12, 2026

Uh oh!

ignacio3437 Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ignacio3437 commented Feb 12, 2026

PR checklist

Uh oh!

ignacio3437 commented Feb 12, 2026

Uh oh!

ignacio3437 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ignacio3437 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ignacio3437 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ignacio3437 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants