Skip to content

Issue with unplaced contig naming in reference genome for array-analysis-cli (--gtc-to-vcf) #83

@leeyb9916

Description

@leeyb9916

I'm working with the array-analysis-cli v1.0.1 (https://support.illumina.com/array/array_software/ima-array-analysis-cli.html) tool to convert Illumina GTC files to VCF. (using array-analysis-cli genotype gtc-to-vcf)

According to the documentation, the contig naming format must match between the manifest file and the reference genome. In our case, we ensured consistency by using the same naming convention for unplaced contigs (e.g., Un_JAAHUQ010000522v1) in both the manifest file and the reference genome.

However, we encountered an error stating that the unplaced contig could not be found in the reference genome.

Failed genome mapping for loci: BICF2G630517802. Error message: Reference is missing entry for chromosome: UN_JAAHUQ010001617V1

Upon closer inspection of the log file, we noticed that the unplaced contig names were displayed in uppercase (e.g., UN_JAAHUQ010000522V1). As a test, we converted all unplaced contig names in the reference genome to uppercase and re-ran the tool. This time, the VCF was successfully generated without errors.

My question is:
For contigs with names that include strings (like unplaced contigs), is it required to use all-uppercase names in the reference genome, regardless of how they are written in the manifest file?
Why is this case sensitivity an issue in this context?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions