Skip to content

Conversation

tomwhite
Copy link
Contributor

Fixes #353 and #355.

However, there is another failing case (like #353), shown by the INFO/HWE_EUR_unrel field in 1kg_2020_chr20_annotations.bcf, which has Number=A, but max_number=0. To fix that, requires finding the actual size of A (alt alleles) for the whole dataset.

There's a comment about this:

bio2zarr/bio2zarr/vcz.py

Lines 143 to 144 in 17db733

# TODO we should really be checking this to see if the named dimensions
# are actually correct.

@jeromekelleher have you thought about how to implement this?

@jeromekelleher
Copy link
Contributor

I'm happy with this as way of squashing these bugs, but I think we'll probably need to be a bit more systematic about dealing with the dimensions to cover all the bases. I guess we'll just need to have a special code path for each of the named dimensions so that they can be handled correctly, taking into account that we could easily have incorrectly stated dimension sizes (i.e., something that says it's allele dimensioned, but is actually of size 10 or something).

You're going to need to update the stupid du tests I'm afraid...

@coveralls
Copy link
Collaborator

coveralls commented Apr 22, 2025

Coverage Status

coverage: 98.414% (+0.003%) from 98.411%
when pulling 0b0d8e5 on tomwhite:vcf-field-conversion-bugs
into 17db733 on sgkit-dev:main.

@tomwhite
Copy link
Contributor Author

I opened #359 for the case that is still failing.

@jeromekelleher jeromekelleher merged commit ac46a92 into sgkit-dev:main Apr 22, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

alt_alleles dimension is not always being added for Number=A fields
3 participants