Skip to content

format of input files: java.lang.NumberFormatException: For input string: #24

@zeneofa

Description

@zeneofa

Hi,

I am trying to compare a set of vcf files to a set of confirmed snps from a genome in a bottle database. I do not have access to the raw fastq file, so I am unsure regarding the filters applied to mapping. I merely have a set of bam files, vcf files a bed region file. I therefore also don't know what post mapping alteration have been performed.

I have have tried to run:

java -jar ~/Downloads/bcbio.variation-0.2.1-standalone.jar variant-compare ref-grading.yaml

where my ref-grading.yaml file contains the following:

dir:
out: grading
prep: grading/prep
experiments:

  • sample: NA00001
    ref: /export/home/pjones/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa
    intervals: ref.bed
    summary-level: quick
    approach: grade
    calls:
    • name: reference
      file: ref.vcf
      remove-refcalls: true
    • name: case1
      prep: true
      preclean: true
      remove-refcalls: true
      file: case1.vcf
      intervals: ref.bed

I get the following error, (I am not familiar with java though):

2015-01-12 16:48:18,299 [INFO ] MLog clients using log4j logging.
2015-01-12 16:48:18,760 [INFO ] State :begin :: {:desc "Starting variation analysis"}
2015-01-12 16:48:18,788 [INFO ] State :clean :: {:desc "Cleaning input VCF: reference"}
2015-01-12 16:48:18,789 [INFO ] State :merge :: {:desc "Merging multiple input files: reference"}
2015-01-12 16:48:18,790 [INFO ] State :prep :: {:desc "Prepare VCF, resorting to genome build: reference"}
"ava.lang.NumberFormatException: For input string: "14596
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at bcbio.align.ref$prep_bedline_sort$fn__1333.invoke(ref.clj:85)
at bcbio.align.ref$sort_bed_file$fn__1338$fn__1339$fn__1344.invoke(ref.clj:98)
at clojure.core$sort_by$fn__4299.invoke(core.clj:2769)
at clojure.lang.AFunction.compare(AFunction.java:49)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
at java.util.TimSort.sort(TimSort.java:203)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at clojure.core$sort.invoke(core.clj:2754)
at clojure.core$sort_by.invoke(core.clj:2769)
at clojure.core$sort_by.invoke(core.clj:2767)
at bcbio.align.ref$sort_bed_file$fn__1338$fn__1339.invoke(ref.clj:99)
at bcbio.align.ref$sort_bed_file$fn__1338.invoke(ref.clj:97)
at bcbio.align.ref$sort_bed_file.invoke(ref.clj:96)
at bcbio.run.broad$gatk_cl_intersect_intervals$fn__1816.invoke(broad.clj:56)
at clojure.core$map$fn__4207.invoke(core.clj:2487)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$map$fn__4207.invoke(core.clj:2479)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$tree_seq$walk__4647$fn__4648.invoke(core.clj:4475)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.LazySeq.more(LazySeq.java:96)
at clojure.lang.RT.more(RT.java:607)
at clojure.core$rest.invoke(core.clj:73)
at clojure.core$flatten.invoke(core.clj:6478)
at bcbio.run.broad$gatk_cl_intersect_intervals.doInvoke(broad.clj:56)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at bcbio.variation.filter.intervals$select_by_sample.doInvoke(intervals.clj:56)
at clojure.lang.RestFn.invoke(RestFn.java:846)
at bcbio.variation.combine$dirty_prep_work$run_sample_select__1157.invoke(combine.clj:140)
at bcbio.variation.combine$dirty_prep_work.invoke(combine.clj:155)
at bcbio.variation.combine$gatk_normalize.invoke(combine.clj:187)
at bcbio.variation.compare$prepare_vcf_calls$fn__7526.invoke(compare.clj:120)
at clojure.core$map$fn__4207.invoke(core.clj:2487)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.lang.LazilyPersistentVector.create(LazilyPersistentVector.java:31)
at clojure.core$vec.invoke(core.clj:354)
at bcbio.variation.compare$prepare_vcf_calls.invoke(compare.clj:121)
at bcbio.variation.compare$variant_comparison_from_config$iter__7582__7586$fn__7587.invoke(compare.clj:255)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$tree_seq$walk__4647$fn__4648.invoke(core.clj:4475)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.LazySeq.more(LazySeq.java:96)
at clojure.lang.RT.more(RT.java:607)
at clojure.core$rest.invoke(core.clj:73)
at clojure.core$flatten.invoke(core.clj:6478)
at bcbio.variation.compare$variant_comparison_from_config.invoke(compare.clj:254)
at bcbio.variation.compare$_main.invoke(compare.clj:274)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:617)
at bcbio.variation.core$_main.doInvoke(core.clj:35)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at bcbio.variation.core.main(Unknown Source)

I have no idea how to start debuggin this, is there some input file format that I am not aware of? Must my reference.fa be truncated to the same chromosomes as indicated in the bed file?

My Aim: To get a good estimate of the false positive/negative rate, as well as possible factors influencing these (such as coverage, entropy of neigbouring regions, mapping quality etc).

Additional information:
from the header of the vcf file the reference appears to be hg19 ucsc (which is what I used), it also appears that the additional chromosomes have been removed from the header and the call list in the vcf file (ie only chr1 - 22 + x +y). The ref.vcf and bed was downloaded and appear to have the same ucsc naming convension. My reference is indexed and there exists a gatk dictionary file. Java version (jdk 1.7.0_45). CentosOS, cluster with lustre file system.

Kind Regards,
Piet Jones

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions