Skip to content

Convert Number=A to Number=1 when creating an ExpandedVCF  #79

@LTLA

Description

@LTLA

It seems reasonable that per-allele ##INFO fields should be converted from Number=A to Number=1 when expand() is called, given that the 1:many relationship between rows and allelic values is now flattened into a 1:1 mapping.

This has practical consequences, too, as we can see:

fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")

library(VariantAnnotation)
out <- tempfile()
first <- readVcf(fl)
first <- expand(first)

writeVcf(first, out)
roundtrip <- readVcf(out, row.names=FALSE)
roundtrip <- expand(roundtrip)

all.equal(first, roundtrip)
## [1] "Attributes: < Component “assays”: Attributes: < Component “data”: Attributes: < Component “listData”: Component “GT”: Attributes: < Length mismatch: comparison on first 1 components > > > >"
## [2] "Attributes: < Component “assays”: Attributes: < Component “data”: Attributes: < Component “listData”: Component “GQ”: Attributes: < Length mismatch: comparison on first 1 components > > > >"
## [3] "Attributes: < Component “assays”: Attributes: < Component “data”: Attributes: < Component “listData”: Component “DP”: Attributes: < Length mismatch: comparison on first 1 components > > > >"
## [4] "Attributes: < Component “assays”: Attributes: < Component “data”: Attributes: < Component “listData”: Component “HQ”: Attributes: < Length mismatch: comparison on first 1 components > > > >"
## [5] "Attributes: < Component “info”: Attributes: < Component “listData”: Component “AF”: Modes: numeric, S4 > >"
## [6] "Attributes: < Component “info”: Attributes: < Component “listData”: Component “AF”: Attributes: < target is NULL, current is list > > >"
## [7] "Attributes: < Component “info”: Attributes: < Component “listData”: Component “AF”: target is numeric, current is CompressedNumericList > >"
## [8] "Attributes: < Component “metadata”: Component “header”: Attributes: < Component “header”: Attributes: < Component “listData”: Component “fileDate”: Attributes: < Component “listData”: Component “Value”: 1 string mismatch > > > >"

We can ignore mismatches 1-4, as these are inconsequential to most end-users (albeit annoying to developers, see #78). We can also ignore mismatch 8, which is discussed in #78. The interesting discrepancies are that of 5-7, where AF becomes a NumericList after a roundtrip through the VCF file. This is because its ##info is still registering Number=A but should really be Number=1 to match the fact that it's already been flattened by the expand() call to generate first.

Session information
R Under development (unstable) (2023-11-29 r85646)
Platform: aarch64-apple-darwin22.5.0
Running under: macOS Ventura 13.6.1

Matrix products: default
BLAS:   /Users/luna/Software/R/trunk/lib/libRblas.dylib
LAPACK: /Users/luna/Software/R/trunk/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] VariantAnnotation_1.49.2    Rsamtools_2.19.2
 [3] Biostrings_2.71.1           XVector_0.43.0
 [5] SummarizedExperiment_1.33.1 Biobase_2.63.0
 [7] GenomicRanges_1.55.1        GenomeInfoDb_1.39.5
 [9] IRanges_2.37.0              S4Vectors_0.41.3
[11] MatrixGenerics_1.15.0       matrixStats_1.2.0
[13] BiocGenerics_0.49.1

loaded via a namespace (and not attached):
 [1] KEGGREST_1.43.0          rjson_0.2.21             lattice_0.22-5
 [4] vctrs_0.6.5              tools_4.4.0              bitops_1.0-7
 [7] generics_0.1.3           curl_5.2.0               parallel_4.4.0
[10] tibble_3.2.1             fansi_1.0.6              AnnotationDbi_1.65.2
[13] RSQLite_2.3.4            blob_1.2.4               pkgconfig_2.0.3
[16] Matrix_1.6-4             BSgenome_1.71.1          dbplyr_2.4.0
[19] lifecycle_1.0.4          GenomeInfoDbData_1.2.11  compiler_4.4.0
[22] stringr_1.5.1            progress_1.2.3           codetools_0.2-19
[25] yaml_2.3.8               RCurl_1.98-1.13          pillar_1.9.0
[28] crayon_1.5.2             BiocParallel_1.37.0      DelayedArray_0.29.0
[31] cachem_1.0.8             abind_1.4-5              tidyselect_1.2.0
[34] digest_0.6.33            stringi_1.8.3            restfulr_0.0.15
[37] dplyr_1.1.4              biomaRt_2.59.0           fastmap_1.1.1
[40] grid_4.4.0               cli_3.6.2                SparseArray_1.3.2
[43] magrittr_2.0.3           S4Arrays_1.3.1           GenomicFeatures_1.55.1
[46] XML_3.99-0.16            utf8_1.2.4               rappdirs_0.3.3
[49] filelock_1.0.3           prettyunits_1.2.0        bit64_4.0.5
[52] httr_1.4.7               bit_4.0.5                png_0.1-8
[55] hms_1.1.3                memoise_2.0.1            BiocIO_1.13.0
[58] BiocFileCache_2.11.1     rtracklayer_1.63.0       rlang_1.1.2
[61] glue_1.6.2               DBI_1.2.0                xml2_1.3.6
[64] R6_2.5.1                 GenomicAlignments_1.39.0 zlibbioc_1.49.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions