-
Notifications
You must be signed in to change notification settings - Fork 30
Description
I'm working with the array-analysis-cli v1.0.1 (https://support.illumina.com/array/array_software/ima-array-analysis-cli.html) tool to convert Illumina GTC files to VCF. (using array-analysis-cli genotype gtc-to-vcf)
According to the documentation, the contig naming format must match between the manifest file and the reference genome. In our case, we ensured consistency by using the same naming convention for unplaced contigs (e.g., Un_JAAHUQ010000522v1) in both the manifest file and the reference genome.
However, we encountered an error stating that the unplaced contig could not be found in the reference genome.
Failed genome mapping for loci: BICF2G630517802. Error message: Reference is missing entry for chromosome: UN_JAAHUQ010001617V1
Upon closer inspection of the log file, we noticed that the unplaced contig names were displayed in uppercase (e.g., UN_JAAHUQ010000522V1). As a test, we converted all unplaced contig names in the reference genome to uppercase and re-ran the tool. This time, the VCF was successfully generated without errors.
My question is:
For contigs with names that include strings (like unplaced contigs), is it required to use all-uppercase names in the reference genome, regardless of how they are written in the manifest file?
Why is this case sensitivity an issue in this context?