forked from ave-dcd/dcd_mapping
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Copy link
Labels
app: mapperTask implementation touches the mapperTask implementation touches the mappertype: enhancementNew feature or requestNew feature or request
Description
The current mapping logic uses either the user provided UniProt ID or the target sequence name to attempt to infer the HGNC symbol of the mapped region. We use this HGNC symbol to narrow our search query when looking for transcripts which overlap our alignment result. This causes frailty during the transcript selection process since this information is not always provided, can be potentially inaccurate, or could be in an unsupported format. This causes mapping to fail unexpectedly, even in situations that we should be able to support.
We should no longer fail to select a transcript when this information is not provided. The following algorithm should be used for transcript selection, which retains many portions of the existing logic:
- Align the target sequence with BLAT
- Fetch transcripts which overlap the aligned region (notably, without an HGNC symbol filter)
- Perform transcript selection within each distinct gene. This will either leave us with (a) one transcript in cases where we have no overlapping genes in a region or (2) one transcript per gene when multiple genes overlap an aligned region
- If we still have more than one candidate transcript, we should compare the similarity of the transcript to the provided target sequence. This should result in an obvious selection result, and we should then choose the most similar transcript.
Metadata
Metadata
Assignees
Labels
app: mapperTask implementation touches the mapperTask implementation touches the mappertype: enhancementNew feature or requestNew feature or request