Replies: 1 comment 1 reply
-
It may be that your process is expecting a specific filename, and the output of the previous process doesn't match the filename. This is not uncommon on nf-core pipelines, and is usually solved with something like the snippet below in a configuration file: process {
withName: PICARD_MARKDUPLICATES {
ext.prefix = { "output_${meta.id}" }
}
} A first step should be to check the task dir of the failed task and see what input files are linked there, if any. If you can share a minimal reproducible example or a publicly available pipeline for me to check, I can try to be reproduce the issue on my side and work on a solution 😄 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am running a variant calling pipeline using GATK4 used as a containerized nextflow script for analyzing BGI sequencing data, which I am submitting as a Slurm job to HPC. I set my input to the reads I want analyzed, the workflow begins and completes step 1 but fails step 2 saying that the _aligned_reads.sam file does not exist, which is the output of step 1. The process/error is below:
executor > local (2)
[7b/56c3d5] process > align (1) [100%] 1 of 1 ✔
[83/d89167] process > markDuplicatesSpark (1) [ 0%] 0 of 1
[- ] process > getMetrics -
[- ] process > haplotypeCaller -
[- ] process > selectVariants -
[- ] process > filterSnps -
[- ] process > filterIndels -
[- ] process > bqsr -
[- ] process > analyzeCovariates -
[- ] process > snpEff -
[- ] process > qc -
Error executing process > 'markDuplicatesSpark (1)'
Caused by:
Process
markDuplicatesSpark (1)
terminated with an error exit status (2)Command executed:
mkdir -p /scratch/projects/oleksyk-lab/gatk4/gatk_temp/furious_hamilton/
gatk --java-options "-Djava.io.tmpdir=/scratch/projects/oleksyk-lab/gatk4/gatk_temp/furious_hamilton/" MarkDuplicatesSpark -I _aligned_reads.sam -M _dedup_metrics.txt -O _sorted_dedup.bam
rm -r /scratch/projects/oleksyk-lab/gatk4/gatk_temp/furious_hamilton/
Command exit status:
2
Command output:
(empty)
Command error:
18:17:56.068 INFO ContextHandler - Started o.s.j.s.ServletContextHandler@51e0f2eb{/api,null,AVAILABLE,@Spark}
18:17:56.069 INFO ContextHandler - Started o.s.j.s.ServletContextHandler@aa794a3{/jobs/job/kill,null,AVAILABLE,@Spark}
18:17:56.069 INFO ContextHandler - Started o.s.j.s.ServletContextHandler@22cb8e5f{/stages/stage/kill,null,AVAILABLE,@Spark}
18:17:56.072 INFO ContextHandler - Started o.s.j.s.ServletContextHandler@5ca8c904{/metrics/json,null,AVAILABLE,@Spark}
18:17:56.076 INFO MarkDuplicatesSpark - Spark verbosity set to INFO (see --spark-verbosity argument)
18:17:56.118 INFO GoogleHadoopFileSystemBase - GHFS version: 1.9.4-hadoop3
WARNING 2023-10-09 18:17:56 SamReaderFactory Unable to detect file format from input URL or stream, assuming SAM format.
WARNING 2023-10-09 18:17:56 SamReaderFactory Unable to detect file format from input URL or stream, assuming SAM format.
18:17:56.286 INFO MemoryStore - Block broadcast_0 stored as values in memory (estimated size 1540.3 KiB, free 17.8 GiB)
18:17:56.593 INFO MemoryStore - Block broadcast_0_piece0 stored as bytes in memory (estimated size 68.4 KiB, free 17.8 GiB)
18:17:56.596 INFO BlockManagerInfo - Added broadcast_0_piece0 in memory on hpc-compute-p36.cm.cluster:44093 (size: 68.4 KiB, free: 17.8 GiB)
18:17:56.599 INFO SparkContext - Created broadcast 0 from broadcast at SamSource.java:78
18:17:56.719 INFO MemoryStore - Block broadcast_1 stored as values in memory (estimated size 188.3 KiB, free 17.8 GiB)
18:17:56.741 INFO MemoryStore - Block broadcast_1_piece0 stored as bytes in memory (estimated size 41.8 KiB, free 17.8 GiB)
18:17:56.742 INFO BlockManagerInfo - Added broadcast_1_piece0 in memory on hpc-compute-p36.cm.cluster:44093 (size: 41.8 KiB, free: 17.8 GiB)
18:17:56.742 INFO SparkContext - Created broadcast 1 from newAPIHadoopFile at SamSource.java:108
18:17:56.833 INFO BlockManagerInfo - Removed broadcast_1_piece0 on hpc-compute-p36.cm.cluster:44093 in memory (size: 41.8 KiB, free: 17.8 GiB)
18:17:56.837 INFO BlockManagerInfo - Removed broadcast_0_piece0 on hpc-compute-p36.cm.cluster:44093 in memory (size: 68.4 KiB, free: 17.8 GiB)
WARNING 2023-10-09 18:17:56 SamReaderFactory Unable to detect file format from input URL or stream, assuming SAM format.
WARNING 2023-10-09 18:17:56 SamReaderFactory Unable to detect file format from input URL or stream, assuming SAM format.
18:17:56.903 INFO MemoryStore - Block broadcast_2 stored as values in memory (estimated size 1540.3 KiB, free 17.8 GiB)
18:17:56.912 INFO MemoryStore - Block broadcast_2_piece0 stored as bytes in memory (estimated size 68.4 KiB, free 17.8 GiB)
18:17:56.913 INFO BlockManagerInfo - Added broadcast_2_piece0 in memory on hpc-compute-p36.cm.cluster:44093 (size: 68.4 KiB, free: 17.8 GiB)
18:17:56.914 INFO SparkContext - Created broadcast 2 from broadcast at SamSource.java:78
18:17:56.917 INFO MemoryStore - Block broadcast_3 stored as values in memory (estimated size 188.3 KiB, free 17.8 GiB)
18:17:56.927 INFO MemoryStore - Block broadcast_3_piece0 stored as bytes in memory (estimated size 41.8 KiB, free 17.8 GiB)
18:17:56.928 INFO BlockManagerInfo - Added broadcast_3_piece0 in memory on hpc-compute-p36.cm.cluster:44093 (size: 41.8 KiB, free: 17.8 GiB)
18:17:56.928 INFO SparkContext - Created broadcast 3 from newAPIHadoopFile at SamSource.java:108
18:17:56.974 INFO BlockManagerInfo - Removed broadcast_2_piece0 on hpc-compute-p36.cm.cluster:44093 in memory (size: 68.4 KiB, free: 17.8 GiB)
18:17:56.977 INFO BlockManagerInfo - Removed broadcast_3_piece0 on hpc-compute-p36.cm.cluster:44093 in memory (size: 41.8 KiB, free: 17.8 GiB)
18:17:56.978 INFO AbstractConnector - Stopped Spark@5cb6966{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
18:17:56.981 INFO SparkUI - Stopped Spark web UI at http://hpc-compute-p36.cm.cluster:4040
18:17:56.989 INFO MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
18:17:57.004 INFO MemoryStore - MemoryStore cleared
18:17:57.004 INFO BlockManager - BlockManager stopped
18:17:57.006 INFO BlockManagerMaster - BlockManagerMaster stopped
18:17:57.008 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped!
18:17:57.016 INFO SparkContext - Successfully stopped SparkContext
18:17:57.016 INFO MarkDuplicatesSpark - Shutting down engine
[October 9, 2023 at 6:17:57 PM EDT] org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=285212672
A USER ERROR has occurred: Failed to load reads from _aligned_reads.sam
Caused by:Input path does not exist: file:_aligned_reads.sam
More Info on what I'm running is below:
Config File:
// Required Parameters
params.reads = "/projects/oleksyk-lab/Kenneth/Golden_Standard/BGI/{E150016531_L01_75_1.fq.gz,E150016531_L01_75_2.fq.gz}"
params.ref = "/projects/oleksyk-lab/Kenneth/Golden_Standard/References/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta"
params.outdir = "/scratch/projects/oleksyk-lab/gatk4"
params.snpeff_db = "GRCh38.105"
params.pl = "bgi"
params.pm = "dnbseq"
// Set the Nextflow Working Directory
// By default this gets set to params.outdir + '/nextflow_work_dir'
workDir = params.outdir + '/nextflow_work_dir'
Slurm Script (dsl1):
module load bwa
module load GATK
export NXF_VER=22.10.7
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
source activate nf-env
nextflow run main.nf -c goldstandardnextflow.config
I cannot find anyone with this error and I'm very confused as to why I am receiving it. Any help is greatly appreciated!!
Beta Was this translation helpful? Give feedback.
All reactions