Skip to content

GeneticVariation.VCF.Reader does not like output from the gatk tool: funcotator #34

@jackmo375

Description

@jackmo375

I have had trouble when reading files with GeneticVariation.VCF.Reader that were output form the gatk tool: funcotator. When I called the function on the vcf file it threw an error. I have tried to describe this error and give a possible cause of the error below. It is a minor issue, and I resolved it by deleting a single line in my input vcf file before calling GeneticVariation.VCF.Reader which solved the problem but i wanted to draw the developers to the issue as it would be better to run it without my hack of manually deleting a line in the input vcf file :)

Expected Behavior

I took a vcf file, v4.2, that was output from funcotator and passed it to the function GeneticVariation.VCF.Reader.

Current Behavior

here is the current behavior from a live Julia session:

julia> inputvcfIO = open("./NA18486.annotated.vcf", "r")
IOStream(<file ./NA18486.annotated.vcf>)

julia> GeneticVariation.VCF.Reader(inputvcfIO)
ERROR: GeneticVariation.VCF.Reader file format error on line 17
Stacktrace:
 [1] error(::String, ::Int64)
   @ Base ./error.jl:44
 [2] _readheader!(reader::GeneticVariation.VCF.Reader, state::BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}})
   @ GeneticVariation.VCF ~/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:106
 [3] readheader!(reader::GeneticVariation.VCF.Reader)
   @ GeneticVariation.VCF ~/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:80
 [4] Reader
   @ ~/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:15 [inlined]
 [5] GeneticVariation.VCF.Reader(input::IOStream)
   @ GeneticVariation.VCF ~/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:28
 [6] top-level scope
   @ REPL[6]:1

Possible Solution / Implementation

So I went to line 17 of the vcf file as the error log above advises, and found the following line:

##Funcotator Version=4.2.6.1 | Gencode 34 CANONICAL | ACMGLMMLof 1 | ACMG_recommendation SF_v2.0 | ClinVar_VCF 20180429_hg38 | LMMKnown 20180618

Once I deleted this line, I re-ran the code:

julia> inputvcfIO = open("./NA18486.annotated.edited.vcf", "r")
IOStream(<file ./NA18486.annotated.edited.vcf>)

julia> GeneticVariation.VCF.Reader(inputvcfIO)
GeneticVariation.VCF.Reader(BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}(BufferedStreams.BufferedInputStream{IOStream}(<256.0 KiB buffer, 28% filled, data immobilized>), 1, 3414, false), GeneticVariation.VCF.Header:
  metainfo tags: fileformat ALT FILTER FORMAT GATKCommandLine INFO contig source
     sample IDs: NA18486)

And as you can see the problem is resolved.

Here is my issue: I cannot see what is wrong with that line ##Funcotator Version=4.2.6... Why is it not accepted as valid input?

Note, it is possible that other gatk tools insert lines like this in their vcf output files so the problem may occur in other places. (I acknowledge this may actually be an issue to raise with the gatk developers instead?)

This bug certainly affects the usability of the code. I was initially using the VIVA command line program for analyzing vcf files and received this error. so I had to go looking at the Julia source code of dependencies of the VIVA program to resolve the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions