Remove UR:file:// and UR:ftp:// from ref search path, plus REF_PATH to EBI#1881
Merged
daviesrob merged 2 commits intosamtools:developfrom May 15, 2025
Merged
Remove UR:file:// and UR:ftp:// from ref search path, plus REF_PATH to EBI#1881daviesrob merged 2 commits intosamtools:developfrom
daviesrob merged 2 commits intosamtools:developfrom
Conversation
e76d1ae to
9087ed8
Compare
Member
|
As this may make failing to get a reference more likely, a useful addition would be to make the message more useful. It should at least give the name for the offending reference, and maybe also a link to the documentation we plan to add on how to get references. |
…o EBI. While use of the EBI refget server was originally encouraged by the CRAM inventors, it became a self-imposed DDOS and it is now unreliable due to rate limiting by the EBI. This removes EBI as a fallback when REF_PATH has not been set. In doing this we discovered that we could still retrieve references (ironically also from EBI due to the test being a 1000genomes CRAM) via the SQ UR: tag supporting remote URIs. This behaviour is explicity listed as not being supported in the samtools manpage and we believe it was an accidental ability added when switching from fopen to bgzf_open for reading the UR reference file. Note this check must be in cram_populate_ref and not load_ref_portion or bgzf_open_ref as the user still has the ability to explicitly request an external reference, eg via "samtools view -T URI". open_path_mfile() now takes an extra 'int *local' argument which is filled out with non-zero if the file found in REF_PATH is local. Non-local files will be cached to REF_CACHE if set, but it no longer has a default value as we did when ebi refget was the default REF_PATH. This means it should operate much as before, except for the lack of EBI defaults.
edc5ac2 to
178d1ba
Compare
Edits from review
daviesrob
added a commit
to samtools/samtools
that referenced
this pull request
May 19, 2025
Updates to reflect the changes in the HTSlib UR: and REF_PATH pull request samtools/htslib#1881. Also moved the section listing the reference discovery ordering to be adjacent to the environment variable section so all relevant text is together. --------- Co-authored-by: Rob Davies <rmd@sanger.ac.uk>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While use of the EBI refget server was originally encouraged by the CRAM inventors, it became a self-imposed DDOS and it is now unreliable due to explicit rate-limiting by the EBI. This removes EBI as a fallback when REF_PATH has not been set.
In doing this we discovered that we could still retrieve references (ironically also from EBI due to the test being a 1000genomes CRAM) via the SQ UR: tag supporting remote URIs. This behaviour is explicitly listed as not being supported in the samtools manpage and we believe it was an accidental ability added when switching from
fopentobgzf_openfor reading the UR reference file.Note this check must be in
cram_populate_refand notload_ref_portionorbgzf_open_refas the user still has the ability to explicitly request an external reference, eg via "samtools view -T URI".open_path_mfile()now takes an extra 'int *local' argument which is filled out with non-zero if the file found in REF_PATH is local. Non-local files will be cached to REF_CACHE if set, but it no longer has a default value as we did when ebi refget was the default REF_PATH. This means it should operate much as before, except for the lack of EBI defaults.