Skip to content

locateVariants() and predictCoding() silently drop large INDELs #81

@johnstonmj

Description

@johnstonmj

I have encountered an issue with large INDELs being silently discarded from my results.

This is the offending VCF entry:
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTGCCGCACGAATCCCATGGTGGCCAGCTCGGTCGTCCCGGGGCCTCTACTGTCCAGAGTCCTCCGCGGATCCCGATCTGACGGTTCACTAAACGAGCTCTGTTTATATAGACCTCCCACCGTACACGCCTACCGCCCATTTGCGTCAACGGGGCGGGCGATCGCAGTTGTTACGACATTTTGGAAAGTCCCGTTGATTTTGGTGCCAAAACAAACTCCCATTGACGTCAATGGGGTGGAGACTTGGAAATCCCCGTGAGTCAAACCGCTATCCACGCCCATTGGTGTACTGCCAAAACCGCATCACCATGGTAATAGCGATGACTAATACGTAGATGTACTGCCAAGTAGGAAAGTCCCGTAAGGTCATGTACTGGGCATAATGCCAGGCGGGCCATTTACCGTCATTGACGTCAATAGGGGGCGGACTTGGCATATGATACACTTGATGTACTGCCAAGTGGGCAGTTTACCGTAAATACTCCACCCATTGACGTCAATGGAAAGTCCCTATTGGCGTTACTATGGGAACATACGTCATTATTGACGTCAATGGGCGGGGGTCGTTGGGCGGTCAGCCAGGCGGGCCATTTACCGTAAGTTATGTAACGCGGAACTCCATATATGGGCTATGAACTAATGACCCCGTAATTGATTACTATTAATAACTAGTCAATAATCAATGCCAACATGGCGGTCATATTGGACATGAGCCAATATAAATGTACATATTATGATATAGATACAACGTATGCAATGGCCAATAGCCAATATTGATTTATGCTATATAACCAATGAATAATATGGCTAATGGCCAATATTGAAGATCCCCGGGTACCGAGCTCGAATTCATCGATGATGATCCACTAGTAACGGCCGCCAGTGTGCTGGAATTCGCCCTTCCCGCATGGCATCTCATTACCGCCCGATCCGGCGGTTTCCGCTTCCGTTCCGCATGCTAACGAGGAACGGGCAGGGGGCGGGGCCCGGGCCCCGACTTCCCGGTTCGGCGGTAATGTGATACGAGCCCCGCGCGCCCGTTGGCCGTCCCCGGGCCCCCGGTCCCGCCCGCCGGACGCCGGGACCAACGGGACGGCGGGCGGCCCTTGGGCCGCCCGCCTTGCCGCCCCCCCATTGGCCGGCGGGCGGGACCGCCCCAAGGGGGCGGGGCCGCCGGGTAAAAGAAGTGAGAACGCGAAGCGTTCGCACTTCGTCCCAATATATATATATTATTAGGGCGAAGTGCGAGCACTGGCGCCGTGCCCGACTCCGCGCCGGCCCCGGGGGCGGACCCGGGCGGCGGGGGGCGGGTCTCTCCGGCGCACATAAAGGCCCGGCGCGACCGACGCCCGCAGACGGCGCCGGCCACGAACGACGGGAGCGGCTGCGGAGCACGCGGACCGGGAGCGGGAGTCGCAGAGGGCCGTCGGAGCGGACGGCGTCGGCATCGCGACGCCCCGGCTCGGGATCGGGATCGCATCGGAAAGGGACACGCGGACGCGGGGGGGAAAGACCCGCCCACCCCACCCACGAAACACAGGGGACGCACCCCGGGGGCCTCCGACGACAGAAACCCACCGGTCCGCCTTTTTTGCACGGGTAAGCACCTTGGGTGGGCGGAGGAGGGGGGACGCGGGGGCGGAGGAGGGGGGACGCGGGGGCCGGAGGAGGGGGGACGCGGGGGCGGAGGAGGGGG G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259

This represents a deletion of 2542 bp.

The large deletion is present when I call
rowRanges(vcf)

So it is successfully being read by readVcf()

However, when I call:

locateVariants(
  vcf,
  txdb,
  AllVariants(
    promoter = PromoterVariants(0,0),
    intergenic = IntergenicVariants(0,0)
    )
  )

or
predictCoding(vcf, txdb, seqSource=fasta)

this variant position is not included among the results.

With some trial and error, I have determined that truncating the REF sequence to 800 characters allows the variant to be maintained, but 850 characters fails.

Success, 850 bp
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTG G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259

Fails, 900 bp
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTGCCGCACGAATCCCATGGTGGCCAGCTCGGTCGTCCCGGGGCCTCTACTGT G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259

Can you suggest a way to maintain these large INDELs among the results?
Alternatively, if it is impossible to maintain large INDELs, could a warning or error be returned instead of a silent discard?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions