Skip to content

Conversation

@selmanozleyen
Copy link
Member

hi this is a continuation of #1069 .

@selmanozleyen selmanozleyen self-assigned this Dec 2, 2025
@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.92%. Comparing base (91dd574) to head (76caa2b).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1072      +/-   ##
==========================================
+ Coverage   65.90%   65.92%   +0.02%     
==========================================
  Files          45       46       +1     
  Lines        6772     6776       +4     
  Branches     1138     1138              
==========================================
+ Hits         4463     4467       +4     
  Misses       1875     1875              
  Partials      434      434              
Files with missing lines Coverage Δ
src/squidpy/datasets/_10x_datasets.py 97.91% <100.00%> (+0.04%) ⬆️
src/squidpy/datasets/_hash_registry.py 100.00% <100.00%> (ø)
src/squidpy/datasets/_utils.py 75.83% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timtreis
Copy link
Member

timtreis commented Dec 2, 2025

I think this will be potentially flaky as long as we download directly from the 10X page. We should include a pivot to our own AWS in this PR.

cc @Zethson @grst

@selmanozleyen selmanozleyen linked an issue Dec 2, 2025 that may be closed by this pull request
@selmanozleyen
Copy link
Member Author

ok is this something I can do or is it do we need an account etc for it?

@timtreis
Copy link
Member

timtreis commented Dec 2, 2025

We do have an account but I'm a) not sure how to upload datasets there and b) also not sure how to properly download from there. One of the many things we should homogenise across packages. But Lukas and Gregor would know, so waiting for them to chime in

@Zethson
Copy link
Member

Zethson commented Dec 2, 2025

It's pretty simple. We have an AWS account and I can share the credentials for our datasets IAM with you on zulip. Then you create a folder on S3 for squidpy and upload the datasets there. There's a field on the UI for each dataset where you get a raw URL that is publicly exposed so you can just copy it here.

If you need more guidance just let us know.

Edit: Selman got credentials.

@selmanozleyen
Copy link
Member Author

ok to confirm I am trying to download these files atm:


Figshare datasets (15 files):
  - four_i.h5ad
  - imc.h5ad
  - seqfish.h5ad
  - visium_hne_adata.h5ad
  - visium_fluo_adata.h5ad
  - visium_hne_adata_crop.h5ad
  - visium_fluo_adata_crop.h5ad
  - sc_mouse_cortex.h5ad
  - mibitof.h5ad
  - merfish.h5ad
  - slideseqv2.h5ad
  - visium_fluo_image_crop.tiff
  - visium_hne_image_crop.tiff
  - visium_hne_image.tiff
  - visium_hne_sdata.zip

10x Genomics datasets (35 samples):
  v1.1.0: 14 samples × 3 files = 42 files
  v1.2.0: 13 samples × 3 files = 39 files
  v1.3.0: 8 samples × 3 files = 24 files

I am also going to need to restructure the dataset classes etc.

@selmanozleyen
Copy link
Member Author

I will continue this here #1076. I decided starting from scratch for this was more reasonable since I refactored many things for the datasets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Put datasets on S3

4 participants