Skip to content

Gene Feature Enumeration

sjmack edited this page Feb 21, 2015 · 2 revisions

Gene Feature Enumeration (GFE)

A proposal has been made for a system of enumerated gene features (untranslated regions [UTRs], exons and introns) as an extension of the HLA allele nomenclature (http://biorxiv.org/content/early/2015/02/15/015222).

We expanded and refined the elements of the original GFE proposal as summarized here GFE_update_02202015.pdf.

DaSH II Revisions

  • Change GFE notation for partial sequences from a decimal (e.g., 8.443) enumeration to a separate enumeration of partial sequences denoted with p, for 'partial' (e.g., p1, p2, p3). A partial sequence is defined as a sequence that is not full-length for a given feature due to a limitation of the typing methodology (e.g., different primer locations). Since a partial sequence can potentially match multiple full-length feature sequences, it may not be valid to identify a given partial sequence as a short version of a particular full-length feature.
  • Treat unavailable/untyped/untested sequence for a feature as a partial sequence, and denote these as p0. Essentially, a unavailable sequence is a potential match to all full-length feature sequences.
  • Treat indels as sequence variants and enumerate them as full sequences; these sequence are not full length for a given feature due to biological variation.
  • Similarly treat deleted features as legitimate sequence variants and enumerate them as full sequences.
  • Treat duplications of sequence features (e.g., two intron 1(i1) and exon 2 (e2) sequences) in a single gene as nucleotide variants of the second duplicated feature; see GFE_update_02202015.pdf. If i2 and e2 are duplicated (e.g., 5'UTRe1i1e2i1e23'UTR), treat the second i1~e2 as part of the sequence of the first e2. This maintains the field structure for each gene.
  • Change the delimiter from colons (:) to semi-colons (;) to further distinguish GFE notation from allele names.

DaSH

Clone this wiki locally