Skip to content

Comments

fix: Handle missing value in VCF INFO array fields#314

Merged
mwiewior merged 1 commit intomasterfrom
fix/vcf-info-missing-array-values
Feb 19, 2026
Merged

fix: Handle missing value in VCF INFO array fields#314
mwiewior merged 1 commit intomasterfrom
fix/vcf-info-missing-array-values

Conversation

@mwiewior
Copy link
Collaborator

Summary

  • Bump datafusion-bio-formats git rev to pick up the fix from datafusion-bio-formats#74
  • VCF INFO array fields (Number=R, Number=A, Number=.) panicked on . (the standard VCF missing value) within comma-separated elements (e.g. AD=.,15, AF=0.5,.), causing silent data truncation (e.g. 5M rows → 229K rows with no error)
  • Add regression tests covering all 3 array types (Integer, Float, String) with missing values

Fixes #312

Test plan

  • test_info_array_missing_values_no_row_loss — all 4 rows returned (no silent data loss)
  • test_info_array_missing_integer.null in integer list arrays
  • test_info_array_missing_float.null in float list arrays
  • test_info_array_missing_string.null in string list arrays
  • test_scan_vcf_info_array_missing_values — lazy scan path also works
  • All 34 existing VCF tests pass (no regressions)

🤖 Generated with Claude Code

Bump datafusion-bio-formats to pick up the fix for VCF INFO array fields
(Number=R/A/., Type=Integer/Float/String) that panicked on `.` missing
values within comma-separated elements (e.g. AD=.,15, AF=0.5,.).
Previously this caused silent data truncation.

Add regression tests verifying no row loss and correct null representation
for all three array types (Integer, Float, String).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mwiewior mwiewior merged commit ec4304f into master Feb 19, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scan_vcf panics on multi-valued INFO fields containing "." (VCF missing value) — silent data truncation

1 participant