Skip to content

Minos improperly declares ultra-low coverage, returns malformed(?) VCF, but only sometimes #127

@aofarrel

Description

@aofarrel

A handful of samples which, as far as I can tell, are perfectly valid and average >15x coverage, end up being made of almost entirely nonsensical variants in their final minos VCF. By "nonsensical variants" I mean:

  • A value of ./.:.:0,0:.:0:0,0:0.0:0.0 in the final string
  • A value of . for the FILTER column, as opposed to PASS or a specific failing filter

Additionally, the overall sample reports a ludicrously small coverage that doesn't seem to match the BAM file nor other tools like TBProfiler. This behavior exists in 0.12.5 and 0.11.3 of clockwork and is consistent across different platforms.

To be clear, most samples don't do this. But the ones that do this always do this.

Problematic behavior (demonstrated by SAMEA112181294, likely L4)

VCFs are truncated here, links are github attachments to the full files (.txt extension forced by github)

minos.vcf

##minosMeanReadDepth=0.05
##minosReadDepthVariance=0.274
[...]
##FILTER=<ID=MAX_DP,Description="Maximum DP of 1.620350279396288 (= 3.0 standard deviations from the mean read depth 0.05)">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMEA112181294
NC_000962.3	1977	.	A	G	.	.	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	./.:.:0,0:.:0:0,0:0.0:0.0
NC_000962.3	4013	.	T	C	.	.	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	./.:.:0,0:.:0:0,0:0.0:0.0
[...]
NC_000962.3	4039991	.	C	T	.	.	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	./.:.:0,0:.:0:0,0:0.0:0.0
NC_000962.3	4042761	.	G	A	.	.	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	./.:.:0,0:.:0:0,0:0.0:0.0
NC_000962.3	4053050	.	A	G	.	MAX_DP;MIN_FRS	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	1/1:3:2,3:0.6:5:2,3:5.61:65.55
NC_000962.3	4053161	.	A	G	.	MAX_DP	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	1/1:14:0,14:1.0:14:0,14:73.13:98.97
NC_000962.3	4053494	.	A	G	.	MAX_DP;MIN_FRS	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	0/0:5:5,1:0.8333:6:5,1:22.47:88.15
NC_000962.3	4055801	.	G	A	.	.	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	./.:.:0,0:.:0:0,0:0.0:0.0
NC_000962.3	4059904	.	A	G	.	.	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	./.:.:0,0:.:0:0,0:0.0:0.0

cortex.vcf

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMEA112181294
NC_000962.3	1977	UNION_BC_k31_var_48	A	G	.	PASS	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	1/1:0,19:20.21
NC_000962.3	4013	UNION_BC_k31_var_11	T	C	.	PASS	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	1/1:0,11:9.12
[...]
NC_000962.3	4039991	UNION_BC_k31_var_7	C	T	.	PASS	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	1/1:0,11:9.12
NC_000962.3	4042761	UNION_BC_k31_var_428	G	A	.	PASS	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	1/1:0,9:6.34
NC_000962.3	4053161	UNION_BC_k31_var_672	A	G	.	MAPQ	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	0/1:89,45:10.73
NC_000962.3	4055801	UNION_BC_k31_var_813	G	A	.	PASS	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	1/1:0,24:27.14
NC_000962.3	4059904	UNION_BC_k31_var_85	A	G	.	PASS	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	1/1:0,11:9.12

samtools.vcf

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMEA112181294
NC_000962.3	1977	.	A	G	173.416	.	DP=19;VDB=0.69101;SGB=-0.69168;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,19,0;MQ=58	GT:PL	1/1:203,54,0
NC_000962.3	4013	.	T	C	225.417	.	DP=25;VDB=0.668708;SGB=-0.692914;MQSBZ=-0.960769;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,12,13;MQ=59	GT:PL	1/1:255,75,0
[...]
NC_000962.3	4039991	.	C	T	225.421	.	DP=15;VDB=0.0195268;SGB=-0.688148;MQSBZ=0;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,10,5;MQ=60	GT:PL	1/1:255,42,0
NC_000962.3	4042761	.	G	A	225.417	.	DP=17;VDB=0.311164;SGB=-0.690438;MQSBZ=-0.942809;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,8,9;MQ=58	GT:PL	1/1:255,51,0
NC_000962.3	4053050	.	A	G	12.0624	.	DP=28;VDB=0.831298;SGB=-0.616816;RPBZ=0.381844;MQBZ=-0.102873;MQSBZ=2.98941;BQBZ=-1.17378;SCBZ=0;FS=0;MQ0F=0.25;AC=1;AN=2;DP4=7,12,3,3;MQ=30	GT:PL	0/1:46,0,198
NC_000962.3	4053161	.	A	G	62.7601	.	DP=25;VDB=0.413877;SGB=-0.676189;RPBZ=-0.898407;MQBZ=2.3316;MQSBZ=2.53772;BQBZ=0.914953;SCBZ=0.356854;FS=0;MQ0F=0.8;AC=2;AN=2;DP4=8,5,6,5;MQ=10	GT:PL	1/1:90,5,0
NC_000962.3	4053494	.	A	G	42.6501	.	DP=69;VDB=0.00202333;SGB=-0.683931;RPBZ=0.206833;MQBZ=-1.01827;MQSBZ=-0.676342;BQBZ=0.45061;SCBZ=0.387152;FS=0;MQ0F=0.173913;AC=1;AN=2;DP4=21,19,5,8;MQ=30	GT:PL	0/1:77,0,255
NC_000962.3	4055801	.	G	A	225.417	.	DP=46;VDB=0.628354;SGB=-0.693147;MQSBZ=0;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,29,17;MQ=60	GT:PL	1/1:255,138,0
NC_000962.3	4059904	.	A	G	225.417	.	DP=29;VDB=0.791223;SGB=-0.693079;MQSBZ=0.781736;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,18,11;MQ=59	GT:PL	1/1:255,81,0

More typical, expected behavior (demonstrated by SAMN09812405, likely M. bovis)

minos.vcf

##minosMeanReadDepth=34.034
##minosReadDepthVariance=332.743
[...]
##FILTER=<ID=MAX_DP,Description="Maximum DP of 88.75773342526989 (= 3.0 standard deviations from the mean read depth 34.034)">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMN09812405
NC_000962.3	467	.	A	G	.	PASS	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	1/1:59:0,59:1.0:59:0,59:379.32:90.38
NC_000962.3	1977	.	A	G	.	PASS	.	GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE	1/1:35:0,35:1.0:35:0,35:231.32:59.34

cortex.vcf

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMN09812405
NC_000962.3	467	UNION_BC_k31_var_540	A	G	.	PASS	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	1/1:0,43:46.92
NC_000962.3	1977	UNION_BC_k31_var_133	A	G	.	PASS	KMER=31;SVLEN=0;SVTYPE=SNP	GT:COV:GT_CONF	1/1:0,20:15.03

samtools.vcf

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMN09812405
NC_000962.3	467	.	A	G	225.417	.	DP=60;VDB=0.136566;SGB=-0.693147;MQSBZ=0;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,29,21;MQ=60	GT:PL	1/1:255,151,0
NC_000962.3	1977	.	A	G	225.417	.	DP=40;VDB=0.99524;SGB=-0.693141;MQSBZ=0;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,14,23;MQ=60	GT:PL	1/1:255,111,0

BAMs in igv (SAMN09812405 on top)

Image Image Image Image Image

mpileup of respective called variant positions (truncated)

Yellow is SAMEA112181294, beige is SAMN09812405

Image Image

Reproducing the issue

I've attached the full VCFs in this ticket. They were generated via this workflow, which runs clockwork's variant caller in a Docker image which is just clockwork-0.12.5 + tree + pigz + fastp. All that really happens upstream of variant calling is seqtk sample -s1965 "$inputfq" 1000000, standard fastp cleaning, and standard clockwork decontamination. We've been running this workflow on tens of thousands of samples largely without issue; the number of samples that end up with these weird outputs in minos seem to be under 1%.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions