-
Notifications
You must be signed in to change notification settings - Fork 99
Adding a registry to have the hashes of datasets (restructured for aws s3) #1076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1076 +/- ##
==========================================
- Coverage 66.47% 66.46% -0.01%
==========================================
Files 45 44 -1
Lines 7015 7124 +109
Branches 1184 1199 +15
==========================================
+ Hits 4663 4735 +72
- Misses 1890 1919 +29
- Partials 462 470 +8
🚀 New features to boost your workflow:
|
…e its relative. It's fine if set it globally
for more information, see https://pre-commit.ci
…r each new script
…/squidpy into add-dataset-hashes
| # Image datasets | ||
| "visium_fluo_image_crop", | ||
| "visium_hne_image_crop", | ||
| "visium_hne_image", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but don't fully understand the comments here? Everything visium_x is f.e. "10x Genomics". Either remove or more semantically useful split
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to split it based on the old files. Do you have any suggestions
| # ============================================================================= | ||
|
|
||
| # 10x Genomics Visium datasets (adata_with_image type) | ||
| VisiumDatasets = Literal[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this list coming from? Are we loading all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is from the old API (main branch). Didn't want to remove it because it was public. (under _10x.py)
src/squidpy/datasets/_datasets.py
Outdated
| """ | ||
| Download Visium `datasets <https://support.10xgenomics.com/spatial-gene-expression/datasets>`_ from *10x Genomics*. | ||
| Uses the unified downloader which supports S3 with fallback to 10x Genomics servers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should have that fallback - the reason for S3 was control over the data existence. Feels overengineered here? I don't expect AWS to fail much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we could then get rid of that agent-spoofing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to save the original links basically. But I can get rid of them
| self, | ||
| entry: DatasetEntry, | ||
| path: Path | str | None = None, | ||
| ) -> Any: # Returns SpatialData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you returning Any but saying SpatialData?
| from typing import TYPE_CHECKING, Any | ||
|
|
||
| import pooch | ||
| from scanpy import logging as logg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of the scanpy public API, please remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok but why is it not marked with an underscore then? our notebooks also use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're also moving Squidpy to use the spatialdata logger in the other functions, feel free to use that one instead
src/squidpy/datasets/_datasets.py
Outdated
| """ | ||
| Download Visium `datasets <https://support.10xgenomics.com/spatial-gene-expression/datasets>`_ from *10x Genomics*. | ||
| Uses the unified downloader which supports S3 with fallback to 10x Genomics servers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we could then get rid of that agent-spoofing
for more information, see https://pre-commit.ci
again a continuation of #1072 to add hashes and use the uploaded datasets
changes made
visium_hne_image = _make_image_loader("visium_hne_image")