select a specific output file from one process and use an an input to another process in nextflow #3669
-
I am writing a nextflow based script for gatk based analysis on normal-tumor paired samples. process gatkrun {
publishDir params.trim, mode : 'copy'
input:
tuple val(pair_id), file(sorted_file)
output:
tuple val(pair_id),file(alignment_metrics_file),file(insert_size_metrics_file),file(Histogram_file),file(readgroup_bamfile),file(dedup_readsbamfile),file(dedup_readsbaifile),
file(dedup_metricsfile),file(recal_datatable),file(recal_readsbamfile),file(post_recal_datatable),file(recalibration_plots)
script:
alignment_metrics_file = pair_id + '_alignment_metrics.txt'
insert_size_metrics_file = pair_id + '_insertsize_metrics.txt'
Histogram_file = pair_id + '_insert_size_histogram.pdf'
readgroup_bamfile = pair_id + '_readgroup.bam'
dedup_readsbamfile = pair_id + '_dedup_reads.bam'
dedup_readsbaifile = pair_id + '_dedup_reads.bai'
dedup_metricsfile = pair_id + '_dedup_metrics.txt'
recal_datatable = pair_id + '_recal_data.table'
recal_readsbamfile = pair_id + '_recal_reads.bam'
post_recal_datatable = pair_id + '_postrecal_data.table'
recalibration_plots = pair_id + '_recalibration_plots.pdf'
"""
./gatk CollectAlignmentSummaryMetrics -R ${params.reference} -I $sorted_file -O $alignment_metrics_file
/./gatk CollectInsertSizeMetrics -I $sorted_file -O $insert_size_metrics_file -H $Histogram_file
./gatk AddOrReplaceReadGroups --INPUT $sorted_file --OUTPUT $readgroup_bamfile --RGLB lib1 --RGPL illumina --RGPU NONE --RGSM $pair_id
./gatk MarkDuplicates -I $readgroup_bamfile -O $dedup_readsbamfile -M $dedup_metricsfile -AS true --VALIDATION_STRINGENCY LENIENT
./gatk BuildBamIndex --INPUT $dedup_readsbamfile --OUTPUT $dedup_readsbaifile
./gatk - BaseRecalibrator -R ${params.reference} -I $dedup_readsbamfile --known-sites ${params.snpfile} -O $recal_datatable
./gatk ApplyBQSR -R ${params.reference} -I $dedup_readsbamfile -bqsr $recal_datatable -O $recal_readsbamfile
./gatk BaseRecalibrator -R ${params.reference} -I $recal_readsbamfile --known-sites ${params.snpfile} -O $post_recal_datatable
./gatk -AnalyzeCovariates -before $recal_datatable -after $post_recal_datatable -plots $recalibration_plots
"""
} in the next process i want to give input of only one file from above process but it takes the very first output file as an input in next process. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @awast , I would recommend that you split your output declaration into multiple outputs: output:
tuple val(pair_id), file(alignment_metrics_file)
tuple val(pair_id), file(insert_size_metrics_file)
// ...
tuple val(pair_id), file(recal_readsbamfile), emit: recal_readsbamfile Then, when you call this process in the workflow, you can pass individual outputs as you need them: workflow {
gatkrun(ch_input)
next_process(gatkrun.out.recal_readsbamfile)
} Going even further, it might be good to split each of these GATK commands into separate processes. That way Nextflow can parallelize these steps as much as possible. |
Beta Was this translation helpful? Give feedback.
Hi @awast , I would recommend that you split your output declaration into multiple outputs:
Then, when you call this process in the workflow, you can pass individual outputs as you need them:
Going even further, it might be good to split each of these GATK commands into separate processes. That way Nextflow can parallelize these steps as much as possible.