-
Notifications
You must be signed in to change notification settings - Fork 77
Description
Hi,
I'd like to first start this conversation then create more specific issues in points you agree with me. I have some suggestions about modifying and generalizing the internals of SpatialData annotations.
One row can only link to one spatial element
Currently, a row in a table can at most only annotate one type of a spatial element. E.g. If sdata['table'][i] annotates sdata['shape'][i], then sdata['table'][i] can't annotate sdata['label'][i].
Take this test code I wrote for example #946
sdata = concatenate(
{
"labels": blobs_annotating_element("blobs_labels"),
"shapes": blobs_annotating_element("blobs_circles"),
"points": blobs_annotating_element("blobs_points"),
"multiscale_labels": blobs_annotating_element("blobs_multiscale_labels"),
},
concatenate_tables=True,
)
third_elems = sdata.tables["table"].obs["instance_id"] == 3
subset_sdata = subset_sdata_by_table_mask(sdata, "table", third_elems)
# here elements with instance_id 3 are more than one in the table
# just to be able to annotate a cell in another region I had to duplicate the count information etcMy conclusion
Because we store each row-to-row mapping in the table itself we end up having to duplicate count information because we "explode" the table.
One row can only link to one item of a spatial element
One-to-many relationship is something we'd like to actually have for points I think. We already have this implicitly for the labels. And we can support this by just generalizing the current annotation scheme.
My suggestion to solve both issues
Ultimately we want a mapping {src_key: {dst_element_name: (dst_access, dst_kind, link_kind, dst_instance_key)}}.
dst_accessis the access method of the dst element, for example"value"or"key". Currently forlabelswe use"value"since there is no columns in a raster image and forshapesandpointswe use"key"since we have a column in the tabledst_kindis the kind of the dst element, for example"labels","shapes","points".link_kindis the kind of the link, for example"one-to-one","one-to-many".dst_instance_keyis the key of the dst element ifdst_accessis"key".
Currently dst_kind serves no purpose as we define the kind of linking we want but I added it for future flexibility.
User interface might look like this.
mapping = {
"instance_id": {
"blobs_labels": ("value", "label", "one-to-one", None),
"blobs_circles": ("key", "shape", "one-to-one", ("shape_id",)),
"parts_of_a_cell": ("key", "shape", "one-to-many", ("shape_id",)),
"blobs_points": ("key", "point", "one-to-many", ("contained_in_shape_id",)),
},
}
add_links(sdata, "table", mapping)Stored in exploded normalized form for example sdata.tables["table"].uns["row_mappings"]
| src_instance_key | dst_elem_name | dst_instance_key | dst_access | dst_kind | link_kind |
| "instance_id" | "blobs_labels" | ... | "value" | "label" | "one-to-one" |
| "instance_id" | "blobs_circles" | ... | "key" | "shape" | "one-to-one" |
| "instance_id" | "parts_of_a_cell" | ... | "key" | "shape" | "one-to-many" |
| "instance_id" | "blobs_points" | ... | "key" | "point" | "one-to-many" |
I think we can manage these changes in a backwards compatible way and this will open up a lot of possibilities for future extensions.
Bonus points: we would have easier time achieving this #293 (comment) as well since the mapping descriptions is much smaller than adding a column to the .obs
