Skip to content

VCF standardization #43

@kubranarci

Description

@kubranarci

Have you checked the docs?

Description of the bug

Running pipeline with hg38 reference with HLA contigs throws issues if there is a mutation. It is related to contig names like:

##contig=<ID=HLA-DRB111:01:01,length=13921>
##contig=<ID=HLA-DRB1
11:01:02,length=13931>
##contig=<ID=HLA-DRB111:04:01,length=13919>
##contig=<ID=HLA-DRB1
12:01:01,length=13404>

  • and : characters are not acceptable in VCF4.2, so it should be omitted, yet platypus uses the direct IDs from BAM headers.

Find a way to omit this issue when converting nonstandard vcfs into standard VCFs (to avoid issues from tabix and bcftools)

Command used and terminal output

Command executed:

  bcftools sort \
      --output indel_EMQN-RING24-HG38_germline_functional_indels_conf_8_to_10.std.sorted.vcf.gz \
      --temp-dir . \
      --output-type z 
  
  cat <<-END_VERSIONS > versions.yml
  "NF_PLATYPUSINDELCALLING:PLATYPUSINDELCALLING:OUTPUT_STANDARD_VCF:BCFTOOLS_SORT":
      bcftools: $(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*$//')
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  Writing to .RoC9Wx
  [E::bcf_hdr_read] Input is not detected as bcf or vcf format
  Could not read VCF/BCF headers from -
  Cleaning

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions