-
Notifications
You must be signed in to change notification settings - Fork 22
Description
A handful of samples which, as far as I can tell, are perfectly valid and average >15x coverage, end up being made of almost entirely nonsensical variants in their final minos VCF. By "nonsensical variants" I mean:
- A value of
./.:.:0,0:.:0:0,0:0.0:0.0in the final string - A value of
.for the FILTER column, as opposed to PASS or a specific failing filter
Additionally, the overall sample reports a ludicrously small coverage that doesn't seem to match the BAM file nor other tools like TBProfiler. This behavior exists in 0.12.5 and 0.11.3 of clockwork and is consistent across different platforms.
To be clear, most samples don't do this. But the ones that do this always do this.
Problematic behavior (demonstrated by SAMEA112181294, likely L4)
VCFs are truncated here, links are github attachments to the full files (.txt extension forced by github)
##minosMeanReadDepth=0.05
##minosReadDepthVariance=0.274
[...]
##FILTER=<ID=MAX_DP,Description="Maximum DP of 1.620350279396288 (= 3.0 standard deviations from the mean read depth 0.05)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMEA112181294
NC_000962.3 1977 . A G . . . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE ./.:.:0,0:.:0:0,0:0.0:0.0
NC_000962.3 4013 . T C . . . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE ./.:.:0,0:.:0:0,0:0.0:0.0
[...]
NC_000962.3 4039991 . C T . . . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE ./.:.:0,0:.:0:0,0:0.0:0.0
NC_000962.3 4042761 . G A . . . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE ./.:.:0,0:.:0:0,0:0.0:0.0
NC_000962.3 4053050 . A G . MAX_DP;MIN_FRS . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE 1/1:3:2,3:0.6:5:2,3:5.61:65.55
NC_000962.3 4053161 . A G . MAX_DP . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE 1/1:14:0,14:1.0:14:0,14:73.13:98.97
NC_000962.3 4053494 . A G . MAX_DP;MIN_FRS . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE 0/0:5:5,1:0.8333:6:5,1:22.47:88.15
NC_000962.3 4055801 . G A . . . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE ./.:.:0,0:.:0:0,0:0.0:0.0
NC_000962.3 4059904 . A G . . . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE ./.:.:0,0:.:0:0,0:0.0:0.0
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMEA112181294
NC_000962.3 1977 UNION_BC_k31_var_48 A G . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,19:20.21
NC_000962.3 4013 UNION_BC_k31_var_11 T C . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,11:9.12
[...]
NC_000962.3 4039991 UNION_BC_k31_var_7 C T . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,11:9.12
NC_000962.3 4042761 UNION_BC_k31_var_428 G A . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,9:6.34
NC_000962.3 4053161 UNION_BC_k31_var_672 A G . MAPQ KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 0/1:89,45:10.73
NC_000962.3 4055801 UNION_BC_k31_var_813 G A . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,24:27.14
NC_000962.3 4059904 UNION_BC_k31_var_85 A G . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,11:9.12
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMEA112181294
NC_000962.3 1977 . A G 173.416 . DP=19;VDB=0.69101;SGB=-0.69168;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,19,0;MQ=58 GT:PL 1/1:203,54,0
NC_000962.3 4013 . T C 225.417 . DP=25;VDB=0.668708;SGB=-0.692914;MQSBZ=-0.960769;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,12,13;MQ=59 GT:PL 1/1:255,75,0
[...]
NC_000962.3 4039991 . C T 225.421 . DP=15;VDB=0.0195268;SGB=-0.688148;MQSBZ=0;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,10,5;MQ=60 GT:PL 1/1:255,42,0
NC_000962.3 4042761 . G A 225.417 . DP=17;VDB=0.311164;SGB=-0.690438;MQSBZ=-0.942809;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,8,9;MQ=58 GT:PL 1/1:255,51,0
NC_000962.3 4053050 . A G 12.0624 . DP=28;VDB=0.831298;SGB=-0.616816;RPBZ=0.381844;MQBZ=-0.102873;MQSBZ=2.98941;BQBZ=-1.17378;SCBZ=0;FS=0;MQ0F=0.25;AC=1;AN=2;DP4=7,12,3,3;MQ=30 GT:PL 0/1:46,0,198
NC_000962.3 4053161 . A G 62.7601 . DP=25;VDB=0.413877;SGB=-0.676189;RPBZ=-0.898407;MQBZ=2.3316;MQSBZ=2.53772;BQBZ=0.914953;SCBZ=0.356854;FS=0;MQ0F=0.8;AC=2;AN=2;DP4=8,5,6,5;MQ=10 GT:PL 1/1:90,5,0
NC_000962.3 4053494 . A G 42.6501 . DP=69;VDB=0.00202333;SGB=-0.683931;RPBZ=0.206833;MQBZ=-1.01827;MQSBZ=-0.676342;BQBZ=0.45061;SCBZ=0.387152;FS=0;MQ0F=0.173913;AC=1;AN=2;DP4=21,19,5,8;MQ=30 GT:PL 0/1:77,0,255
NC_000962.3 4055801 . G A 225.417 . DP=46;VDB=0.628354;SGB=-0.693147;MQSBZ=0;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,29,17;MQ=60 GT:PL 1/1:255,138,0
NC_000962.3 4059904 . A G 225.417 . DP=29;VDB=0.791223;SGB=-0.693079;MQSBZ=0.781736;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,18,11;MQ=59 GT:PL 1/1:255,81,0
More typical, expected behavior (demonstrated by SAMN09812405, likely M. bovis)
##minosMeanReadDepth=34.034
##minosReadDepthVariance=332.743
[...]
##FILTER=<ID=MAX_DP,Description="Maximum DP of 88.75773342526989 (= 3.0 standard deviations from the mean read depth 34.034)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMN09812405
NC_000962.3 467 . A G . PASS . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE 1/1:59:0,59:1.0:59:0,59:379.32:90.38
NC_000962.3 1977 . A G . PASS . GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE 1/1:35:0,35:1.0:35:0,35:231.32:59.34
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMN09812405
NC_000962.3 467 UNION_BC_k31_var_540 A G . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,43:46.92
NC_000962.3 1977 UNION_BC_k31_var_133 A G . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,20:15.03
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMN09812405
NC_000962.3 467 . A G 225.417 . DP=60;VDB=0.136566;SGB=-0.693147;MQSBZ=0;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,29,21;MQ=60 GT:PL 1/1:255,151,0
NC_000962.3 1977 . A G 225.417 . DP=40;VDB=0.99524;SGB=-0.693141;MQSBZ=0;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,14,23;MQ=60 GT:PL 1/1:255,111,0
BAMs in igv (SAMN09812405 on top)
mpileup of respective called variant positions (truncated)
Yellow is SAMEA112181294, beige is SAMN09812405
Reproducing the issue
I've attached the full VCFs in this ticket. They were generated via this workflow, which runs clockwork's variant caller in a Docker image which is just clockwork-0.12.5 + tree + pigz + fastp. All that really happens upstream of variant calling is seqtk sample -s1965 "$inputfq" 1000000, standard fastp cleaning, and standard clockwork decontamination. We've been running this workflow on tens of thousands of samples largely without issue; the number of samples that end up with these weird outputs in minos seem to be under 1%.