Adding a registry to have the hashes of datasets #1072

selmanozleyen · 2025-12-02T14:14:58Z

hi this is a continuation of #1069 .

for more information, see https://pre-commit.ci

codecov · 2025-12-02T14:25:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.92%. Comparing base (91dd574) to head (76caa2b).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1072      +/-   ##
==========================================
+ Coverage   65.90%   65.92%   +0.02%     
==========================================
  Files          45       46       +1     
  Lines        6772     6776       +4     
  Branches     1138     1138              
==========================================
+ Hits         4463     4467       +4     
  Misses       1875     1875              
  Partials      434      434

Files with missing lines	Coverage Δ
src/squidpy/datasets/_10x_datasets.py	`97.91% <100.00%> (+0.04%)`	⬆️
src/squidpy/datasets/_hash_registry.py	`100.00% <100.00%> (ø)`
src/squidpy/datasets/_utils.py	`75.83% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

timtreis · 2025-12-02T14:56:06Z

I think this will be potentially flaky as long as we download directly from the 10X page. We should include a pivot to our own AWS in this PR.

cc @Zethson @grst

selmanozleyen · 2025-12-02T15:02:31Z

ok is this something I can do or is it do we need an account etc for it?

timtreis · 2025-12-02T15:15:59Z

We do have an account but I'm a) not sure how to upload datasets there and b) also not sure how to properly download from there. One of the many things we should homogenise across packages. But Lukas and Gregor would know, so waiting for them to chime in

Zethson · 2025-12-02T15:25:32Z

It's pretty simple. We have an AWS account and I can share the credentials for our datasets IAM with you on zulip. Then you create a folder on S3 for squidpy and upload the datasets there. There's a field on the UI for each dataset where you get a raw URL that is publicly exposed so you can just copy it here.

If you need more guidance just let us know.

Edit: Selman got credentials.

selmanozleyen · 2025-12-05T13:06:38Z

ok to confirm I am trying to download these files atm:


Figshare datasets (15 files):
  - four_i.h5ad
  - imc.h5ad
  - seqfish.h5ad
  - visium_hne_adata.h5ad
  - visium_fluo_adata.h5ad
  - visium_hne_adata_crop.h5ad
  - visium_fluo_adata_crop.h5ad
  - sc_mouse_cortex.h5ad
  - mibitof.h5ad
  - merfish.h5ad
  - slideseqv2.h5ad
  - visium_fluo_image_crop.tiff
  - visium_hne_image_crop.tiff
  - visium_hne_image.tiff
  - visium_hne_sdata.zip

10x Genomics datasets (35 samples):
  v1.1.0: 14 samples × 3 files = 42 files
  v1.2.0: 13 samples × 3 files = 39 files
  v1.3.0: 8 samples × 3 files = 24 files

I am also going to need to restructure the dataset classes etc.

selmanozleyen · 2025-12-06T06:23:16Z

I will continue this here #1076. I decided starting from scratch for this was more reasonable since I refactored many things for the datasets

init

76e7704

selmanozleyen self-assigned this Dec 2, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

76caa2b

for more information, see https://pre-commit.ci

selmanozleyen requested a review from timtreis December 2, 2025 14:22

selmanozleyen linked an issue Dec 2, 2025 that may be closed by this pull request

Put datasets on S3 #1063

Open

selmanozleyen mentioned this pull request Dec 6, 2025

Adding a registry to have the hashes of datasets (restructured for aws s3) #1076

Open

selmanozleyen closed this Dec 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding a registry to have the hashes of datasets #1072

Adding a registry to have the hashes of datasets #1072

Uh oh!

selmanozleyen commented Dec 2, 2025

Uh oh!

codecov bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

timtreis commented Dec 2, 2025

Uh oh!

selmanozleyen commented Dec 2, 2025

Uh oh!

timtreis commented Dec 2, 2025

Uh oh!

Zethson commented Dec 2, 2025 •

edited

Loading

Uh oh!

selmanozleyen commented Dec 5, 2025

Uh oh!

selmanozleyen commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adding a registry to have the hashes of datasets #1072

Adding a registry to have the hashes of datasets #1072

Uh oh!

Conversation

selmanozleyen commented Dec 2, 2025

Uh oh!

codecov bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

timtreis commented Dec 2, 2025

Uh oh!

selmanozleyen commented Dec 2, 2025

Uh oh!

timtreis commented Dec 2, 2025

Uh oh!

Zethson commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

selmanozleyen commented Dec 5, 2025

Uh oh!

selmanozleyen commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Dec 2, 2025 •

edited

Loading

Zethson commented Dec 2, 2025 •

edited

Loading