-
Notifications
You must be signed in to change notification settings - Fork 18
Description
I have had trouble when reading files with GeneticVariation.VCF.Reader that were output form the gatk tool: funcotator. When I called the function on the vcf file it threw an error. I have tried to describe this error and give a possible cause of the error below. It is a minor issue, and I resolved it by deleting a single line in my input vcf file before calling GeneticVariation.VCF.Reader which solved the problem but i wanted to draw the developers to the issue as it would be better to run it without my hack of manually deleting a line in the input vcf file :)
Expected Behavior
I took a vcf file, v4.2, that was output from funcotator and passed it to the function GeneticVariation.VCF.Reader.
Current Behavior
here is the current behavior from a live Julia session:
julia> inputvcfIO = open("./NA18486.annotated.vcf", "r")
IOStream(<file ./NA18486.annotated.vcf>)
julia> GeneticVariation.VCF.Reader(inputvcfIO)
ERROR: GeneticVariation.VCF.Reader file format error on line 17
Stacktrace:
[1] error(::String, ::Int64)
@ Base ./error.jl:44
[2] _readheader!(reader::GeneticVariation.VCF.Reader, state::BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}})
@ GeneticVariation.VCF ~/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:106
[3] readheader!(reader::GeneticVariation.VCF.Reader)
@ GeneticVariation.VCF ~/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:80
[4] Reader
@ ~/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:15 [inlined]
[5] GeneticVariation.VCF.Reader(input::IOStream)
@ GeneticVariation.VCF ~/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:28
[6] top-level scope
@ REPL[6]:1
Possible Solution / Implementation
So I went to line 17 of the vcf file as the error log above advises, and found the following line:
##Funcotator Version=4.2.6.1 | Gencode 34 CANONICAL | ACMGLMMLof 1 | ACMG_recommendation SF_v2.0 | ClinVar_VCF 20180429_hg38 | LMMKnown 20180618
Once I deleted this line, I re-ran the code:
julia> inputvcfIO = open("./NA18486.annotated.edited.vcf", "r")
IOStream(<file ./NA18486.annotated.edited.vcf>)
julia> GeneticVariation.VCF.Reader(inputvcfIO)
GeneticVariation.VCF.Reader(BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}(BufferedStreams.BufferedInputStream{IOStream}(<256.0 KiB buffer, 28% filled, data immobilized>), 1, 3414, false), GeneticVariation.VCF.Header:
metainfo tags: fileformat ALT FILTER FORMAT GATKCommandLine INFO contig source
sample IDs: NA18486)
And as you can see the problem is resolved.
Here is my issue: I cannot see what is wrong with that line ##Funcotator Version=4.2.6... Why is it not accepted as valid input?
Note, it is possible that other gatk tools insert lines like this in their vcf output files so the problem may occur in other places. (I acknowledge this may actually be an issue to raise with the gatk developers instead?)
This bug certainly affects the usability of the code. I was initially using the VIVA command line program for analyzing vcf files and received this error. so I had to go looking at the Julia source code of dependencies of the VIVA program to resolve the issue.