Validate liftover results against pyliftover

### Feature description

We wrote this to replace [pyliftover](https://github.com/konstantint/pyliftover/tree/master), but we are at the whims of an external library (the St Jude chainfile crate). It'd be good to be sure that we're producing accurate results. One way to produce some kind of sanity check would be to run `agct` and `pyliftover` over the whole genome to see if they agree with each other.

In particular, it seems like the result from `pyliftover` on negative strand input differs from `agct` by 1 position. I don't remember if this is intentional or not but might be something that we need to look at again.

### Use case

We want accurate mappings.

### Acceptance Criteria

A script in the `analysis/` folder that runs a genome-wide pos-by-pos liftover and compares results between the two libraries + some way to log results and produce summary statistics. I don't think this should run in CI, per se, but it'd be good to have it on hand and run occasionally if there are major code changes.

### Proposed solution

I scraped this together very quickly but it would need some polish

```python
"""Compare results from pyliftover and agct"""
from pyliftover import LiftOver

from agct import Converter, Genome

pylo_conv = LiftOver("hg38", "hg19")
agct_conv = Converter(Genome.HG38, Genome.HG19)

chrom_names = [
    "chr1",
    "chr2",
    "chr3",
    "chr4",
    "chr5",
    "chr6",
    "chr7",
    "chr8",
    "chr9",
    "chr10",
    "chr11",
    "chr12",
    "chr13",
    "chr14",
    "chr15",
    "chr16",
    "chr17",
    "chr18",
    "chr19",
    "chr20",
    "chr21",
    "chr22",
    "chrX",
    "chrY"
]

def results_are_equal(py_result: list[list], agct_result: list[tuple]) -> bool:
    if len(py_result) != len(agct_result):
        return False
    for py, agct in zip(py_result, agct_result):
        if py[0] != agct[0]:
            return False
        if py[1] != agct[1]:
            return False
        if py[2] != agct[2].value:
            return False
        # ignore chainfile score value in pyliftover result
    return True

# arbitrarily long value that should exceed the length of all chromosomes on all reference sequences 
# it'd be nice to write in some expected lengths per chromosome
MAX_POS = 300000000

for chrom in chrom_names:
    for pos in range(0, MAX_POS): 
        py_result = pylo_conv.convert_coordinate(chrom, pos)
        agct_result = agct_conv.convert_coordinate(chrom, pos, "+")
        if py_result and py_result[0][2] == "-":
            continue  # these all seem to be off by 1
        if not results_are_equal(py_result, agct_result):
            print((chrom, pos, py_result, agct_result))
```

### Alternatives considered

_No response_

### Implementation details

_No response_

### Potential Impact

_No response_

### Additional context

_No response_

### Contribution

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validate liftover results against pyliftover #51

Feature description

Use case

Acceptance Criteria

Proposed solution

Alternatives considered

Implementation details

Potential Impact

Additional context

Contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validate liftover results against pyliftover #51

Description

Feature description

Use case

Acceptance Criteria

Proposed solution

Alternatives considered

Implementation details

Potential Impact

Additional context

Contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions