Skip to content

Liftover handling when when input is on the Positive Strand and output is on the Negative strand #56

@jennifer-bowser

Description

@jennifer-bowser

Describe the bug

This isn't necessarily a bug, but I didn't know what else to call it. I've found a couple of instances where AGCT lifts over coordinates for a variant that are input on the Positive strand; but after liftover, the coordinates return on the Negative strand.

In a VRS variant context, this means that a variant on the Positive strand can wind up with a lifted-over start coordinate that appears to come after its lifted-over end if the conversion gives Negative-stranded coordinates.

So, this ticket is to investigate why Positive-strand input coordinates sometimes convert to the Negative strand:

  • a) Do we understand what's going on?
  • b) Are we confident about how to process the return data?

Steps to reproduce

Here's an example of a GRCh37 VRS variant where this occurs (HGVS = NC_000001.10:g.148015668G>A):

{
 "digest": "JZgypG2oMSghTMw9ZRAm9O7ImMlFzknE",
 "id": "ga4gh:VA.JZgypG2oMSghTMw9ZRAm9O7ImMlFzknE",
 "location": {   
      "digest": "SCrMRDs0Pc5ib8F-wxASekyU9BdIuDZ-",
       "end": 148015668,
       "id": "ga4gh:SL.SCrMRDs0Pc5ib8F-wxASekyU9BdIuDZ-",
       "sequenceReference": {
             "refgetAccession": "SQ.S_KjnFVz-FE7M0W6yoaUDgYxLPc1jyWU",
             "type": "SequenceReference"
       },
       "start": 148015667,
       "type": "SequenceLocation"
 },
 "state": {
       "sequence": "A",
       "type": "LiteralSequenceExpression"
 },
 "type": "Allele"
}

Using AGCT to convert the start and end coordinates gives us this:

converter = Converter(Genome.HG19, Genome.HG38)

lifted_over_start = converter.convert_coordinate(
    chrom="chr1",
    pos=148015667,
    strand = Strand.POSITIVE
)
lifted_over_end = converter.convert_coordinate(
    chrom="chr1",
    pos=148015668,
    strand = Strand.POSITIVE
)

print("lifted_over_start: ", lifted_over_start)
print("lifted_over_end:   ", lifted_over_end)

--------- output ---------
lifted_over_start: [('chr1', 120824103, <Strand.NEGATIVE: '-'>)]
lifted_over_end:   [('chr1', 120824102, <Strand.NEGATIVE: '-'>)]

Expected behavior

I expect my lifted-over start coordinate to come before my lifted-over end coordinate. In terms more specific to AGCT, I expect that if my coordinate is input on the Positive strand, it should output on the Positive strand as well (and the same goes for Negative strands).

However, it is possible that this behavior is intentional and correct. If so, these reasons should be documented.

Current behavior

Input strand does not correlate to output strand 100% of the time.

Acceptance Criteria

If this behavior is a bug:

  • Input strand DOES correlate to output strand 100% of the time.

If this behavior is not a bug:

  • We understand why input strand doesn't always correlate to output strand.
  • We know how to handle output that returns on the opposing strand.
  • The above is documented somewhere for future reference.

Possible reason(s)

No response

Suggested fix

No response

Branch, commit, and/or version

branch: main

Screenshots

No response

Environment details

MacOS

Additional details

This ticket was born out of this Slack conversation in the context of how AnyVar is using AGCT for its liftover conversions.

Contribution

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions