forked from ave-dcd/dcd_mapping
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
app: mapperTask implementation touches the mapperTask implementation touches the mappertype: enhancementNew feature or requestNew feature or request
Description
Investigators will sometimes report wild type variation as multiple synonymous variants (p.Asp1=, p.Met2=, …, p.Glu=) but will sometimes describe their wild type variation as p.=, representing no changes from the underlying reference across the whole sequence.
Currently, the mapping job ignores these p.= variation descriptions of len == 3 , but is fine with the other single descriptors. Ideally, we’d support both.
It seems to me like there were two ways to describe this p.= syntax in VRS:
- The first would be to convert the p.= string into a string that represents the variation at each position like p.[AA1=; AA2=; …; AAn=]. This has the benefit of being able to go through the vrs-python HGVS translator we are using in the mapping job, but generates a CisPhasedBlock VRS representation that probably isn’t best practice.
- The second is to use the ReferenceLengthExpression VRS object to describe the sequence state as having a length equal to the length of the SequenceReference and no repeatSubunits. This would be a much more compact object, but we’d have to craft the object by hand (to my knowledge) because the translator can’t handle the p.= string (and I realize the difficulty of supporting such a syntax in the translator, even if it were given like NP_XXXX:p.= because of the necessity of inferring from the reference the sequence length and start/stop values for the SequenceLocation object.
Metadata
Metadata
Assignees
Labels
app: mapperTask implementation touches the mapperTask implementation touches the mappertype: enhancementNew feature or requestNew feature or request