Skip to content
sauter edited this page Sep 27, 2014 · 47 revisions

Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING)

MIRING Elements MIRING Element 1: MIRING Annotation Each MIRING message should be identified with a (preferably universal) unique MIRING identifier (ID).

This MIRING ID links information included in the message with additional information for that specimen (e.g., specimen ID) outside of the message.

MIRING Element 2: Reference Context

  1. The reference allele sequence database release version used for allele calling of each gene must be defined. e.g., IPD-KIR Database release version 2.5.0

  2. Reference sequences applied in the genotyping must be explicitly defined in each genotyping report.

Identify any reference genome assembly (or a specific alternate assembly) sequence used with a specific Genome Reference Consortium (GRC) release version. Explicitly identify alternative reference (AltRef) alignments used. GRCh37.p13 (GRC human genome build 37 patch 13)/c6

Identify the reference allele sequences used with a reference allele sequence database (IMGT/HLA or IPD-KIR Database) release version and accession number. e.g., IMGT/HLA Database release version 3.17.0/HLA00001

Each reference sequence should be assigned a unique numerical identifier, ranging from 0 to n (# of reference sequences).

As applied in the MIRING message reference resources used should be categorized as being either: (A) public and curated, (B) public and uncurated, (C) other (i.e., private) or (D) unreferenced (no reference is available)

MIRING Element 3: Full Genotype

The genotype is the collection of all ambiguous alleles that are derived from the consensus sequence. All ambiguous alleles and ambiguous genotypes should be explicitly defined in the genotype.

The set of alleles used to generate the genotype is identified as Element 2.1.

This is not a “best guess” for a two-allele genotype call.

Use Genotype List (GL) String format (or comparable format), and provide a Uniform Resource Identifier (URI) when available.   KIR3DL2008/KIR3DL2038+KIR3DL200701|KIR3DL2027+KIR3DL2*016 & http://gl.immunogenomics.org/1.0/genotype-list/z

Denote novel alleles in the genotype string by including a reference to the EMBL accession number in element 6.

MIRING Element 4: Consensus Sequence

The consensus sequence is generated from the primary read data by the analysis software, and serves as the basis for the genotype.

Format consensus sequences to identify any phase and/or ploidy information that has been generated by the NGS platform (as on the following slide).

Describe consensus sequence blocks using the equivalent of FASTA format, with one sequence block for each reference sequence applied. This requires dividing phased sequence that is aligned to multiple references into multiple blocks.  

0|0|1|1|0|0|0 CAGGAGCAGAGGGGTCAGGGCGAAGTCCCAGGGCCCCAGGCGTGGCTCTCAGGGTCTCAGGCCCCGAAGG CGGTGTATGGATTGGGGAGTCCCAGCCTTGGGGATTCCCCAACTCCGCAGTTTCTTTTCTCCCTCTCCCA ACCTACGTAGGGTCCTTCATCCTGGATACTCACGACGCGGACCCAGTTCTCACTCCCATTGGGTGTCGGG TTTCCAGAGAAGCCAATCAGTGTCGTCGCGGTCGCTGTTCTAAAGTCCGCACGCACCCACCGGGACTCAG ATTCTCCCCAGACGCCGAGG

A format for the FASTA descriptor line follows. The equivalent of a FASTA descriptor line should be defined with seven elements.

Sequence Block ID Number each sequence block from 0 to n; these block IDs must increase in 5’ to 3’ order over all blocks Reference ID This is the numerical reference sequence ID from Element 2.2; category D references identify the absence of reference sequence. C. Reference Coordinate Identify the position in the reference sequence (indexed from 0) corresponding to the 1st position in the consensus block. D. Phasing Group Identify all consensus blocks in phase using the lowest Sequence Block ID with common phase. E. Ploidy identify the ploidy number (1 to n) for each consensus block F. Reference Sequence Match Identify if the consensus block sequence is an exact match to the applied reference (1) or not (0); in the case of 0, a VCF file would be expected (element 5) unless the reference category of the applied reference ID is D. Sequence Continuity Indicate if the consensus sequence is immediately adjacent (no sequence gaps) the preceeding consensus block in the same phasing group (1); if there is a sequence gap or there is no phase, this is 0.

MIRING Element 5: Novel Polymorphisms

Novel polymorphisms in consensus sequences (nucleotide polymorphisms not included in the reference allele sequence database) must be explicitly noted. Characterize novel substitutions (non-synonymous substitutions, indels, stop-codons, etc.) in novel sequences.   Define novel polymorphisms to relative to the IMGT/HLA or IPD-KIR reference sequence (defined in the consensus sequence ID block) for each locus as consistent with Variant Call Format (VCF). Include the consensus sequence block ID defined in Element 4 in the VCF metadata. Include a EMBL accession number in the VCF metadata.

FASTA formatted reference sequence (in IMGT/HLA Database):

HLA:HLA00001 A*01:01:01:01 1098 bp ATGGCCGTCATGACGCCCCGAACCATCCTCCTGCTACTCTCGGGGGCCCTGGCCCTGACC

VCF file denoting novel polymorphic positions relative to the reference: #CHROM POS ID REF ALT QUAL FILTER

3.17.0/HLA00001 12 000001 G A 29 PASS

3.17.0/HLA00001 23 000002 C A 29 PASS 

MIRING Element 6: Read Metadata

MIRING Element 7: Primary Data MIRING Element 8: Platform Documentation (Include Sequence Regions Targeted in Platform Documentation)

Download the MIRING report and appendices from the NGS Data Consortium siteEach MIRING message should be identified with a (preferably universal) unique MIRING identifier (ID).

This MIRING ID links information included in the message with additional information for that specimen (e.g., specimen ID) outside of the message.

DaSH

Clone this wiki locally