DEEPctMUT is a Nextflow pipeline for highly accurate mutation detection in circulating tumor DNA (ctDNA) from plasma and optionally matchhed PBMC samples.
DEEPctMUT takes as input FASTQ files from paired-end sequencing with unique molecular barcode (UMI) of a plasma samples and optinaly matched PBMC samples. After candidate calling and multiple steps of error polishing DEEPctMUT outputs a VCF file with somatic mutations in ctDNA.
- nextflow >= 24.10.4
- miniconda >= 24.3.0
To test the pipeline on the provided toy dataset, download or git clone
the repository and run the following command:
nextflow main.nf -profile conda,test --output_dir OUTUPUT_DIR
To run DEEPctMUT on your own data, you need the hg19 reference genome indexed for BWA, as well as a dictionary file.
You also need to prepare a single input_table
, as csv file WITH header and formatted as follows:
patient | sample | Fastq1 | Fastq2 | type | replicate |
---|---|---|---|---|---|
Patient_1 | Patient_1_plasma | /path/to/read1.fastq | /path/to/read2.fastq | Plasma | 1 |
Patient_1 | Patient_1_pbmc | /path/to/read1.fastq | /path/to/read2.fastq | PBMC | 1 |
Then run the following command:
nextflow main.nf -profile conda --input_table INPUT_TABLE --reference REFERENCE --output_dir OUTUPUT_DIR
The pipeline accepts the following command-line arguments:
Argument | Description | Required/Default |
---|---|---|
--input_table |
CSV file with sample information. | Required unless -profile test |
--output_dir |
Output directory for results. | Required |
--reference |
hg19 Reference genome FASTA file. BWA index and dictionary files also need to be in the same directory. | Required |
--bed_file |
BED file with target regions. | Default: test_data/CRC_panel.bed |
--with_pbmc |
Whether to process PBMC samples for background filtering. | Default: true |
--RF_threshold |
Random Forest model threshold for variant calling. | Default: 0.21 |
--DeepES_threshold |
DeepES model threshold to call a mutation significant. | Default: 0.01 |
--non_hotspot_reads |
Minimum number of reads supporting a mutation on non-hostpot regions. | Default: 30 |
--min_vaf |
Minimum variant allele frequency (VAF) to call a mutation. | Default: 0.0003 |
DEEPctMUT outpus for each patient a VCF file with mutation calls in the specified output_dir
directory.