Skip to content

[Not Implimented] removing BWA_MEM sorting#322

Closed
ignacio3437 wants to merge 6 commits intoPlant-Food-Research-Open:mainfrom
ignacio3437:main
Closed

[Not Implimented] removing BWA_MEM sorting#322
ignacio3437 wants to merge 6 commits intoPlant-Food-Research-Open:mainfrom
ignacio3437:main

Conversation

@ignacio3437
Copy link
Collaborator

I spent some time trying to improve the runtime of the assemblyqc hic steps but my changes did not end up speeding up anything significantly so I think I will drop this. But I wanted to share since I did spend a few days on it and I learned a bit along the way.

The pipeline was doing a samtools sort -n step during the hic.bam file creation. The name-sorted bam is only needed for the HICQC module, which only uses ~1M read pairs, so to speed things up I have turned off the name sorting and introduced a new module.

SAMTOOLS_SUBSAMPLE_SORT creates a new bam file of a subset of the hic.bam file that is 5% of the reads. This subset_hic.bam is then name-sorted and passed to HICQC.
The rest of the pipeline uses the full (not name-sorted) hic.bam

Unfortunately the time saved during the BWAMEM step does not pay off in the long run. Here is my test using the HYv4 dataset:

Gallvp_Main_min Iggy_fork_min Gallvp_main_mem(GB) Iggy_fork_mem(GB)
BWA_MEM 349 281 17.7 5
Samblaster 121 110 5.63 5.63
JuicerPre 66 108
SortSub 29 7
TOTAL TIME 536 528

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes: nextflow run . -profile test,docker --outdir <OUTDIR> and nf-test test --profile docker tests/.
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@ignacio3437 ignacio3437 requested a review from GallVp February 9, 2026 22:15
@ignacio3437 ignacio3437 closed this Feb 9, 2026
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

This PR is against the main branch ❌

  • Do not close this PR
  • Click Edit and change the base to dev
  • This CI test will remain failed until you push a new commit

Hi @ignacio3437,

It looks like this pull-request is has been made against the ignacio3437/assemblyqc main branch.
The main branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to main are only allowed if they come from the ignacio3437/assemblyqc dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

@GallVp
Copy link
Member

GallVp commented Feb 9, 2026

@ignacio3437

Can you please reopen the PR against the dev branch? I think we can merge this as a feature.

// MODULE: SAMTOOLS_SUBSAMPLE_SORT
SAMTOOLS_SUBSAMPLE_SORT (
ch_bam,
0.05 // Sample 5% of reads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we turn this into a parameter? By default we can set it to 100% which would essentially skip the SAMTOOLS_SUBSAMPLE_SORT module.


// SUBWORKFLOW: FASTQ_BWA_MEM_SAMBLASTER
val_sort_bam = true
val_sort_bam = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we skip SAMTOOLS_SUBSAMPLE_SORT, do we need to reenable this? Or was this completely unnecessary. I vaguely remember that I had to turn this on because some downstream tool failed. But maybe that was the old HiC workflow based on the run_visualiser script.


- name: Post PR comment
uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728 # v2
uses: marocchino/sticky-pull-request-comment@773744901bac0e8cbb5a0dc842800d45e9b2b405 # v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated?

@ignacio3437 ignacio3437 mentioned this pull request Feb 12, 2026
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants