-
Notifications
You must be signed in to change notification settings - Fork 98
Description
A little bit I started on this PR #967 which introduces a function that allows users to subset their entire SpatialData
objects by certain criteria. The larger goal is to emulate this Scanpy
notebook and to make Squidpy
basically the biologist-friendly interface to SpatialData
. Subsetting your object will be the first step in that journey.
For this, there are several considerations:
- a given
SpatialData
object can contain 0-nAnnData
objects - these
AnnData
objects can annotate 0-n other objects, f.e. segmentation masks, shapes (like for Visium), ROIs or even points - a given subsetting step on the AnnData object needs to find all instances that are annotated by these soon-to-be-gone observations in all other elements and deal with them accordingly:
- segmentation masks -> set to 0 (background)
- potentially: remove transcript locations falling into these segmentation masks
- shapes -> remove
- points -> remove
- etc
- segmentation masks -> set to 0 (background)
However, there are additional constraints and open questions that are important for the implementation.
- We can f.e. store segmentation masks as DataTrees with different scales - is it faster to subset the original resolution and to then regenerate the tree or subset all scales individually?
- How do we handle inplace
True
vsFalse
? Returning a copy can easily mean doubling a 500 GB object.
Some other edge cases might only really show up once there.
Generally, the goal should be to identify relevant subfunctions and push these upstream to SpatialData
, some might already exist there and just need to be found (realistically by asking @LucaMarconato, there's quite a few functions only he really knows about), other might need to be written and pushed upstream. Ideally Squidpy
then chains together these functions into something with good UX.