-
Notifications
You must be signed in to change notification settings - Fork 260
Description
I am working with a WES tumor-only experiment.
Variant calling was performed using three different tools:
- Mpileup
- Mutect
- Freebayes
(in the pictures I kept the same order)
All three callers detect the same complex variant, but each represents it differently in the original VCF.
To normalize the variants, I used the following command:
bcftools norm --atomize -f ref.fasta -o output.vcf input.vcf
Before normalized I have this representation:
Freebayes
chr17 7675081 . GGGGCAGC GGA
Mutect2
chr17 7675082 . GGGC G
chr17 7675086 . AGC A
Mpileup
chr17 7675081 . GGGGCAG GG
chr17 7675088 . C A
After normalization the result was
Freebayes:
chr17 7675083 . GGCAGC A
Mutect2:
chr17 7675082 . GGGC G
chr17 7675086 . AGC A
Mpileup
chr17 7675081 . GGGGCA G
chr17 7675088 . C A
Even after applying bcftools norm --atomize, the same biological variant is still represented differently across callers:
- Different POS
- Different decomposition boundaries
- Different REF/ALT lengths
I was expecting --atomize to produce a canonical, consistent representation across callers (same coordinates and minimal atomic variants), but this did not happen.
Is this behavior expected?
Does --atomize intentionally preserve caller-specific breakpoints or representations?
Is there a recommended way to obtain an identical representation for complex variants across different callers, so that intersections/overlaps between VCFs can be computed reliably?