norm -m + --atomize inconsistent representation of complex variants

I am working with a WES tumor-only experiment.
Variant calling was performed using three different tools:

- Mpileup
- Mutect
- Freebayes 

(in the pictures I kept the same order)

<img width="1671" height="949" alt="Image" src="https://github.com/user-attachments/assets/6a08bd47-a304-4c8d-84ef-6dafb626a1b9" />

All three callers detect the same complex variant, but each represents it differently in the original VCF.

To normalize the variants, I used the following command:

`bcftools norm --atomize -f ref.fasta -o output.vcf input.vcf`

Before normalized I have this representation:
Freebayes
`chr17	7675081	.	GGGGCAGC	GGA`

Mutect2
```
chr17	7675082	.	GGGC	G	
chr17	7675086	.	AGC	A	
```
Mpileup
```
chr17	7675081	.	GGGGCAG	GG	
chr17	7675088	.	C	A	
```

After normalization the result was
Freebayes:
`chr17	7675083	.	GGCAGC	A`

Mutect2:
```
chr17	7675082	.	GGGC	G	
chr17	7675086	.	AGC	A
```

Mpileup
```
chr17	7675081	.	GGGGCA	G	
chr17	7675088	.	C	A
```

Even after applying `bcftools norm --atomize`, the same biological variant is still represented differently across callers:

- Different POS
- Different decomposition boundaries
- Different REF/ALT lengths

I was expecting `--atomize` to produce a canonical, consistent representation across callers (same coordinates and minimal atomic variants), but this did not happen.

Is this behavior expected?
Does `--atomize` intentionally preserve caller-specific breakpoints or representations?

Is there a recommended way to obtain an identical representation for complex variants across different callers, so that intersections/overlaps between VCFs can be computed reliably?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

norm -m + --atomize inconsistent representation of complex variants #2482

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

norm -m + --atomize inconsistent representation of complex variants #2482

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions