forked from ave-dcd/dcd_mapping
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
app: mapperTask implementation touches the mapperTask implementation touches the mappertype: enhancementNew feature or requestNew feature or requesttype: maintenanceMaintaining this projectMaintaining this project
Description
Although we should always do a new alignment in production, it can be useful to cache BLAT results while developing to reduce the turnaround on testing mapping runs. To this point, this has been done by replacing the function _get_blat_output with the following definition:
def _get_blat_output(metadata: ScoresetMetadata, silent: bool) -> dict[str, QueryResult]:
out_file = f"{metadata.urn}_blat.psl"
if Path(out_file).is_file():
output = parse_blat(out_file, "blat-psl")
else:
with tempfile.NamedTemporaryFile() as tmp_file:
query_file = _build_query_file(metadata, Path(tmp_file.name))
target_sequence_type = _get_target_sequence_type(metadata)
if target_sequence_type == TargetSequenceType.PROTEIN:
target_args = "-q=prot -t=dnax"
elif target_sequence_type == TargetSequenceType.DNA:
target_args = ""
else:
# TODO implement support for mixed types, not hard to do - just split blat into two files and run command with each set of arguments.
msg = "Mapping for score sets with a mix of nucleotide and protein target sequences is not currently supported."
raise NotImplementedError(msg)
process_result = _run_blat(target_args, query_file, "/dev/stdout", silent)
# out_file = _write_blat_output_tempfile(process_result)
with open(out_file, "wb") as fh:
fh.write(process_result.stdout.split(b"Loaded")[0])
try:
output = parse_blat(out_file, "blat-psl")
# TODO reevaluate this code block - are there cases in mavedb where target sequence type is incorrectly supplied?
except ValueError:
target_args = "-q=dnax -t=dnax"
process_result = _run_blat(
target_args, query_file, "/dev/stdout", silent
)
out_file = _write_blat_output_tempfile(process_result)
try:
output = parse_blat(out_file, "blat-psl")
except ValueError as e:
msg = f"Unable to run successful BLAT on {metadata.urn}"
raise AlignmentError(msg) from e
return output
and reverting the change prior to committing new features.
We should add some sort of environment variable that can control the caching of BLAT files so that we can control this behavior automatically in development.
Metadata
Metadata
Assignees
Labels
app: mapperTask implementation touches the mapperTask implementation touches the mappertype: enhancementNew feature or requestNew feature or requesttype: maintenanceMaintaining this projectMaintaining this project