Skip to content

Flag for BLAT Caching in Development #54

@bencap

Description

@bencap

Although we should always do a new alignment in production, it can be useful to cache BLAT results while developing to reduce the turnaround on testing mapping runs. To this point, this has been done by replacing the function _get_blat_output with the following definition:

def _get_blat_output(metadata: ScoresetMetadata, silent: bool) -> dict[str, QueryResult]:
    out_file = f"{metadata.urn}_blat.psl"

    if Path(out_file).is_file():
        output = parse_blat(out_file, "blat-psl")
    else:
        with tempfile.NamedTemporaryFile() as tmp_file:
            query_file = _build_query_file(metadata, Path(tmp_file.name))
            target_sequence_type = _get_target_sequence_type(metadata)
            if target_sequence_type == TargetSequenceType.PROTEIN:
                target_args = "-q=prot -t=dnax"
            elif target_sequence_type == TargetSequenceType.DNA:
                target_args = ""
            else:
                # TODO implement support for mixed types, not hard to do - just split blat into two files and run command with each set of arguments.
                msg = "Mapping for score sets with a mix of nucleotide and protein target sequences is not currently supported."
                raise NotImplementedError(msg)
            process_result = _run_blat(target_args, query_file, "/dev/stdout", silent)
            # out_file = _write_blat_output_tempfile(process_result)
            with open(out_file, "wb") as fh:
                fh.write(process_result.stdout.split(b"Loaded")[0])

            try:
                output = parse_blat(out_file, "blat-psl")

            # TODO reevaluate this code block - are there cases in mavedb where target sequence type is incorrectly supplied?
            except ValueError:
                target_args = "-q=dnax -t=dnax"
                process_result = _run_blat(
                    target_args, query_file, "/dev/stdout", silent
                )
                out_file = _write_blat_output_tempfile(process_result)
                try:
                    output = parse_blat(out_file, "blat-psl")
                except ValueError as e:
                    msg = f"Unable to run successful BLAT on {metadata.urn}"
                    raise AlignmentError(msg) from e

    return output

and reverting the change prior to committing new features.

We should add some sort of environment variable that can control the caching of BLAT files so that we can control this behavior automatically in development.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions