-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Describe the bug
This isn't necessarily a bug, but I didn't know what else to call it. I've found a couple of instances where AGCT lifts over coordinates for a variant that are input on the Positive strand; but after liftover, the coordinates return on the Negative strand.
In a VRS variant context, this means that a variant on the Positive strand can wind up with a lifted-over start coordinate that appears to come after its lifted-over end if the conversion gives Negative-stranded coordinates.
So, this ticket is to investigate why Positive-strand input coordinates sometimes convert to the Negative strand:
- a) Do we understand what's going on?
- b) Are we confident about how to process the return data?
Steps to reproduce
Here's an example of a GRCh37 VRS variant where this occurs (HGVS = NC_000001.10:g.148015668G>A):
{
"digest": "JZgypG2oMSghTMw9ZRAm9O7ImMlFzknE",
"id": "ga4gh:VA.JZgypG2oMSghTMw9ZRAm9O7ImMlFzknE",
"location": {
"digest": "SCrMRDs0Pc5ib8F-wxASekyU9BdIuDZ-",
"end": 148015668,
"id": "ga4gh:SL.SCrMRDs0Pc5ib8F-wxASekyU9BdIuDZ-",
"sequenceReference": {
"refgetAccession": "SQ.S_KjnFVz-FE7M0W6yoaUDgYxLPc1jyWU",
"type": "SequenceReference"
},
"start": 148015667,
"type": "SequenceLocation"
},
"state": {
"sequence": "A",
"type": "LiteralSequenceExpression"
},
"type": "Allele"
}Using AGCT to convert the start and end coordinates gives us this:
converter = Converter(Genome.HG19, Genome.HG38)
lifted_over_start = converter.convert_coordinate(
chrom="chr1",
pos=148015667,
strand = Strand.POSITIVE
)
lifted_over_end = converter.convert_coordinate(
chrom="chr1",
pos=148015668,
strand = Strand.POSITIVE
)
print("lifted_over_start: ", lifted_over_start)
print("lifted_over_end: ", lifted_over_end)
--------- output ---------
lifted_over_start: [('chr1', 120824103, <Strand.NEGATIVE: '-'>)]
lifted_over_end: [('chr1', 120824102, <Strand.NEGATIVE: '-'>)]Expected behavior
I expect my lifted-over start coordinate to come before my lifted-over end coordinate. In terms more specific to AGCT, I expect that if my coordinate is input on the Positive strand, it should output on the Positive strand as well (and the same goes for Negative strands).
However, it is possible that this behavior is intentional and correct. If so, these reasons should be documented.
Current behavior
Input strand does not correlate to output strand 100% of the time.
Acceptance Criteria
If this behavior is a bug:
- Input strand DOES correlate to output strand 100% of the time.
If this behavior is not a bug:
- We understand why input strand doesn't always correlate to output strand.
- We know how to handle output that returns on the opposing strand.
- The above is documented somewhere for future reference.
Possible reason(s)
No response
Suggested fix
No response
Branch, commit, and/or version
branch: main
Screenshots
No response
Environment details
MacOS
Additional details
This ticket was born out of this Slack conversation in the context of how AnyVar is using AGCT for its liftover conversions.
Contribution
None