Skip to content

Rework trimming#674

Merged
rannick merged 37 commits intodevfrom
rework_trimming
Jul 4, 2025
Merged

Rework trimming#674
rannick merged 37 commits intodevfrom
rework_trimming

Conversation

@rannick
Copy link
Copy Markdown
Collaborator

@rannick rannick commented Jun 2, 2025

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnafusion branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jun 2, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 3bf196c

+| ✅ 223 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗   4 tests had warnings |!
Details

❗ Test warnings:

  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "

    \n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-rnafusion_logo_dark.png">\n <img alt="nf-core/rnafusion" src="docs/images/nf-core-rnafusion_logo_light.png">\n \n

    \n\nGitHub Actions CI Status\nGitHub Actions Linting StatusAWS CICite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\nGet help on SlackFollow on TwitterFollow on MastodonWatch on YouTube\n\n## Introduction\n\nnf-core/rnafusion is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline 1. Read QC (FastQC)2. Present QC for raw reads (MultiQC)\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run nf-core/rnafusion \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nnf-core/rnafusion was originally written by Martin Proks, Annick Renevey.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #rnafusion channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/rnafusion for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
  • pipeline_todos - TODO string in nextflow.config: Update the field with the details of the contributors to your pipeline. New with Nextflow version 24.10.0
  • schema_lint - Input mimetype is missing or empty
  • local_component_structure - fusioninspector_workflow.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md

✅ Tests passed:

Run details

  • nf-core/tools version 3.2.1
  • Run at 2025-07-03 12:29:37

@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.2.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@rannick rannick marked this pull request as ready for review June 4, 2025 10:23
@rannick rannick self-assigned this Jun 4, 2025
@rannick rannick marked this pull request as draft June 4, 2025 10:24
@rannick rannick marked this pull request as ready for review June 5, 2025 09:26
@rannick rannick requested a review from atrigila June 5, 2025 11:31
Copy link
Copy Markdown
Contributor

@atrigila atrigila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a test case for the test_stub that tests the fusioncatcher trimming and see that it produces the outputs. I imagine something like this:

  test("stub test fusioncatcher") {

        when {
            params {
                outdir           = "$outputDir"
                tools            = "fusioncatcher"
                fusion_annot_lib = 'https://github.com/STAR-Fusion/STAR-Fusion-Tutorial/raw/master/CTAT_HumanFusionLib.mini.dat.gz'
                trim_tail_fusioncatcher = 50
            }
        }

rannick and others added 2 commits June 6, 2025 09:26
Co-authored-by: Anabella Trigila <18577080+atrigila@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@nvnieuwk nvnieuwk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tkcaccia
Copy link
Copy Markdown

Fusioncatcher still failling...


software/nextflow-24.10.0 loaded.
Usage: nextflow [options]

�[33mNextflow 25.04.4 is available - Please consider updating your version to it�(B�[m
Picked up _JAVA_OPTIONS: -Xmx32G
N E X T F L O W  ~  version 24.10.0
Launching `https://github.com/nf-core/rnafusion` [nostalgic_angela] DSL2 - revision: e0c951b9d9 [dev]

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rnafusion 4.0.0dev
------------------------------------------------------
Input/output options
  input                      : /scratch/firenze/Nextflow/test.csv
  outdir                     : /scratch/firenze/Nextflow/output
  genomes_base               : /scratch/firenze/Nextflow/human_genome/release_46
  genome_gencode_version     : 46
  tools                      : fusioncatcher
  arriba_ref_blacklist       : /scratch/firenze/Nextflow/human_genome/release_46/arriba/blacklist_hg38_GRCh38_v2.4.0.tsv.gz
  arriba_ref_cytobands       : /scratch/firenze/Nextflow/human_genome/release_46/arriba/cytobands_hg38_GRCh38_v2.4.0.tsv
  arriba_ref_known_fusions   : /scratch/firenze/Nextflow/human_genome/release_46/arriba/known_fusions_hg38_GRCh38_v2.4.0.tsv.gz
  arriba_ref_protein_domains : /scratch/firenze/Nextflow/human_genome/release_46/arriba/protein_domains_hg38_GRCh38_v2.4.0.gff3
  fusioncatcher_ref          : /scratch/firenze/Data/human_genome/release_46/fusioncatcher/human_v102
  fusionreport_ref           : /scratch/firenze/Nextflow/human_genome/release_46/fusion_report_db
  hgnc_ref                   : /scratch/firenze/Data/human_genome/release_46/hgnc/hgnc_complete_set.txt
  hgnc_date                  : /scratch/firenze/Data/human_genome/release_46/hgnc/HGNC-DB-timestamp.txt
  salmon_index               : /scratch/firenze/Nextflow/human_genome/release_46/salmon/salmon
  starfusion_ref             : /scratch/firenze/Nextflow/human_genome/release_46/starfusion/ctat_genome_lib_build_dir
  ctatsplicing_cancer_introns: /scratch/firenze/Data/human_genome/release_46/starfusion/ctat_genome_lib_build_dir/cancer_introns.GRCh38.Jun232020.tsv.gz
  starindex_ref              : /scratch/firenze/Nextflow/human_genome/release_46/star

Read trimming options
  adapter_fasta              : []

Reference genome options
  fasta                      : /scratch/firenze/Nextflow/human_genome/release_46/gencode/Homo_sapiens_GRCh38_46_dna_primary_assembly.fa
  fai                        : /scratch/firenze/Nextflow/human_genome/release_46/gencode/Homo_sapiens_GRCh38_46_dna_primary_assembly.fa.fai
  genome                     : GRCh38
  gtf                        : /scratch/firenze/Nextflow/human_genome/release_46/gencode/Homo_sapiens_GRCh38_46.gtf
  refflat                    : /scratch/firenze/Nextflow/human_genome/release_46/gencode/Homo_sapiens_GRCh38_46.gtf.refflat
  rrna_intervals             : /scratch/firenze/Nextflow/human_genome/release_46/gencode/Homo_sapiens_GRCh38_46.interval_list
  fusion_annot_lib           : /scratch/firenze/Data/human_genome/release_46/starfusion/ctat_genome_lib_build_dir/fusion_lib.Mar2021.dat.gz
  dfam_version               : 3.8
  pfam_version               : 37.4

Generic options
  monochrome_logs            : true
  trace_report_suffix        : 2025-06-22_12-30-48
  star_limit_bam_sort_ram    : 0

Core Nextflow options
  revision                   : dev
  runName                    : nostalgic_angela
  containerEngine            : singularity
  launchDir                  : /scratch/firenze/Nextflow
  workDir                    : /scratch/firenze/Nextflow/work
  projectDir                 : /home/01481067/.nextflow/assets/nf-core/rnafusion
  userName                   : 01481067
  profile                    : singularity
  configFiles                : 

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
* The nf-core framework
    https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
    https://github.com/nf-core/rnafusion/blob/master/CITATIONS.md

[29/eebf63] Cached process > RNAFUSION:FASTQC (CONTROL_REP1)
[5d/8b8baa] Submitted process > RNAFUSION:BUILD_REFERENCES:GENCODE_DOWNLOAD (gencode_download)
[c0/bb5c96] Submitted process > RNAFUSION:FUSIONCATCHER_WORKFLOW:FUSIONCATCHER_FUSIONCATCHER (CONTROL_REP1)
ERROR ~ Error executing process > 'RNAFUSION:FUSIONCATCHER_WORKFLOW:FUSIONCATCHER_FUSIONCATCHER (CONTROL_REP1)'

Caused by:
  Process `RNAFUSION:FUSIONCATCHER_WORKFLOW:FUSIONCATCHER_FUSIONCATCHER (CONTROL_REP1)` terminated with an error exit status (1)


Command executed:

  fusioncatcher \
      --input=WT02_1.fq.gz,WT02_2.fq.gz \
      --output=. \
      --data=human_v102 \
      --threads=6 \
      --Xmx=29491 \
      --limitSjdbInsertNsj 2000000 --skip-blat 
  
  mv final-list_candidate-fusion-genes.txt CONTROL_REP1.fusion-genes.txt
  mv summary_candidate_fusions.txt CONTROL_REP1.summary.txt
  mv fusioncatcher.log CONTROL_REP1.log
  
  cat <<-END_VERSIONS > versions.yml
  "RNAFUSION:FUSIONCATCHER_WORKFLOW:FUSIONCATCHER_FUSIONCATCHER":
      fusioncatcher: "$(fusioncatcher --version 2>&1 | awk '{print $2}')"
  END_VERSIONS

Command exit status:
  1

Command output:
  --------------------------------------------------------------------------------
  ==> Execution time: 0 day(s), 0 hour(s), 0 minute(s), and 0 second(s)
  ////////////////////////////////////////////////////////////////////////////////
    Running: step = 473   Time: 12:47   Date: 2025-06-22 (elapsed time: 0d:0h:17m)
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  ==> Current working directory: '/scratch/firenze/Nextflow/work/c0/bb5c96866830c356d1eaf20ceaead7'
  bbmerge-auto.sh \
  in=candidate_focus_reads.txt.0.fq \
  out=candidate_focus_reads.txt.0_m.fq \
  threads=6 \
  extend2=20 \
  iterations=3 \
  k=17 \
  mindepthseed=1 \
  mindepthextend=1 \
  minoverlap=11 \
  -Xmx29491
  --------------------------------------------------------------------------------
  +-->EXECUTING...
  
  
  ERROR: Workflow execution failed at step 473 while executing:
  ----------------
     bbmerge-auto.sh \
     in=candidate_focus_reads.txt.0.fq \
     out=candidate_focus_reads.txt.0_m.fq \
     threads=6 \
     extend2=20 \
     iterations=3 \
     k=17 \
     mindepthseed=1 \
     mindepthextend=1 \
     minoverlap=11 \
     -Xmx29491
  ----------------
    * Size 'candidate_focus_reads.txt.0.fq' = 424 bytes
    * Size 'candidate_focus_reads.txt.0_m.fq' = 0 bytes
  
  
  Executing second time the same step/command in order to capture error messages (i.e. STDERR)...
  
  -------------------------------------------
  java -Djava.library.path=/usr/local/opt/bbmap-38.44-0/jni/ -ea -Xmx29491 -Xms29491 -cp /usr/local/opt/bbmap-38.44-0/current/ jgi.BBMerge in=candidate_focus_reads.txt.0.fq out=candidate_focus_reads.txt.0_m.fq threads=6 extend2=20 iterations=3 k=17 mindepthseed=1 mindepthextend=1 minoverlap=11 -Xmx29491
  

Copy link
Copy Markdown
Contributor

@atrigila atrigila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my reasoning here correct? I think it is triggering the trim_tail_fusioncatcher in all the tests and not only in the one that sets trim_tail_fusioncatcher as true.

rannick and others added 5 commits July 2, 2025 16:56
Co-authored-by: Anabella Trigila <18577080+atrigila@users.noreply.github.com>
Co-authored-by: Anabella Trigila <18577080+atrigila@users.noreply.github.com>
@rannick rannick merged commit 5f574e8 into dev Jul 4, 2025
17 of 18 checks passed
@rannick rannick deleted the rework_trimming branch July 4, 2025 07:18
@atrigila atrigila mentioned this pull request Sep 16, 2025
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants