[Not Implimented] removing BWA_MEM sorting by ignacio3437 · Pull Request #322 · Plant-Food-Research-Open/assemblyqc

ignacio3437 · 2026-02-09T22:15:09Z

I spent some time trying to improve the runtime of the assemblyqc hic steps but my changes did not end up speeding up anything significantly so I think I will drop this. But I wanted to share since I did spend a few days on it and I learned a bit along the way.

The pipeline was doing a samtools sort -n step during the hic.bam file creation. The name-sorted bam is only needed for the HICQC module, which only uses ~1M read pairs, so to speed things up I have turned off the name sorting and introduced a new module.

SAMTOOLS_SUBSAMPLE_SORT creates a new bam file of a subset of the hic.bam file that is 5% of the reads. This subset_hic.bam is then name-sorted and passed to HICQC.
The rest of the pipeline uses the full (not name-sorted) hic.bam

Unfortunately the time saved during the BWAMEM step does not pay off in the long run. Here is my test using the HYv4 dataset:

	Gallvp_Main_min	Iggy_fork_min	Gallvp_main_mem(GB)	Iggy_fork_mem(GB)
BWA_MEM	349	281	17.7	5
Samblaster	121	110	5.63	5.63
JuicerPre	66	108
SortSub		29		7

TOTAL TIME	536	528

PR checklist

…ore hicqc

github-actions · 2026-02-09T22:15:40Z

This PR is against the `main` branch ❌

Do not close this PR
Click Edit and change the base to dev
This CI test will remain failed until you push a new commit

Hi @ignacio3437,

It looks like this pull-request is has been made against the ignacio3437/assemblyqc main branch.
The main branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to main are only allowed if they come from the ignacio3437/assemblyqc dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

GallVp · 2026-02-09T22:54:49Z

@ignacio3437

Can you please reopen the PR against the dev branch? I think we can merge this as a feature.

GallVp · 2026-02-09T22:55:45Z

subworkflows/local/fq2hic.nf

+    // MODULE: SAMTOOLS_SUBSAMPLE_SORT 
+    SAMTOOLS_SUBSAMPLE_SORT (
+        ch_bam,
+        0.05  // Sample 5% of reads 


Can we turn this into a parameter? By default we can set it to 100% which would essentially skip the SAMTOOLS_SUBSAMPLE_SORT module.

GallVp · 2026-02-09T22:57:13Z

subworkflows/local/fq2hic.nf


    // SUBWORKFLOW: FASTQ_BWA_MEM_SAMBLASTER
-    val_sort_bam = true
+    val_sort_bam = false


If we skip SAMTOOLS_SUBSAMPLE_SORT, do we need to reenable this? Or was this completely unnecessary. I vaguely remember that I had to turn this on because some downstream tool failed. But maybe that was the old HiC workflow based on the run_visualiser script.

GallVp · 2026-02-09T23:00:21Z

.github/workflows/linting_comment.yml


      - name: Post PR comment
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728 # v2
+        uses: marocchino/sticky-pull-request-comment@773744901bac0e8cbb5a0dc842800d45e9b2b405 # v2


This looks unrelated?

ignacio3437 added 6 commits February 4, 2026 14:44

Removing name sort for hic.bam, adding subsample and sorting step bef…

f4c1fba

…ore hicqc

linting

12bef92

forgot to pass new bam to hicqc

9f6b6af

CHANGELOG update and fixing file handling

f785be0

update docs

274a9e1

Changing subsampling to multithread and fixing parameter reading

146d5e7

ignacio3437 requested a review from GallVp February 9, 2026 22:15

ignacio3437 closed this Feb 9, 2026

GallVp reviewed Feb 9, 2026

View reviewed changes

ignacio3437 mentioned this pull request Feb 12, 2026

BWA Mem Sorting optimization #323

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Not Implimented] removing BWA_MEM sorting#322

[Not Implimented] removing BWA_MEM sorting#322
ignacio3437 wants to merge 6 commits intoPlant-Food-Research-Open:mainfrom
ignacio3437:main

ignacio3437 commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

GallVp commented Feb 9, 2026

Uh oh!

GallVp Feb 9, 2026

Uh oh!

GallVp Feb 9, 2026

Uh oh!

GallVp Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ignacio3437 commented Feb 9, 2026

PR checklist

Uh oh!

github-actions bot commented Feb 9, 2026

This PR is against the main branch ❌

Uh oh!

GallVp commented Feb 9, 2026

Uh oh!

GallVp Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

GallVp Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

GallVp Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

This PR is against the `main` branch ❌