-
Notifications
You must be signed in to change notification settings - Fork 98
MVP for morpohology module #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 17 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
b99fa5f
MVP for internal calling of extra props
timtreis bbecec3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6e3d652
Added option to externally feed in functions
timtreis 7b21fda
merge conflict resolved
timtreis 753ec3a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] c9a3cc9
Merge branch 'main' into feature/add_morphology_toolbox
timtreis 34f745f
add rough functionality to write regionprops into sdata["table"].obsm…
npeschke 07606e2
DataFrame now written to obsm instead of numpy array
npeschke 0f0ea7c
add granularity measurement from afermg/cp_measurements
npeschke b33895b
add border_occupied_factor
npeschke ebc08d8
add sanity checks and make multiple coordinate systems work
npeschke 50d58d2
fix assertion to accommodate new interface
npeschke 328ba16
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f7b5270
add possibility for granularity to return multiple values per channel…
npeschke 5c1d70f
Merge remote-tracking branch 'origin/feature/add_morphology_toolbox' …
npeschke 112d1a9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0ba44d2
Merge branch 'main' into feature/add_morphology_toolbox
npeschke 8047dde
add "all" option for regionprops through using methods=None
npeschke 39f9db8
add granularity and border_occupied_factor to _all_regionprops_names
npeschke 02d21c8
refactor quantify_morphology
npeschke 59d8137
add zernike
npeschke bc035d2
status update
npeschke cde0526
handle dict returns from calculate_* features
npeschke 9a9511f
add _sdata_image_features_helper
npeschke 78c0f67
Merge branch 'main' into feature/add_morphology_toolbox
timtreis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,128 @@ | ||
from __future__ import annotations | ||
|
||
from collections.abc import Mapping, Sequence | ||
import typing | ||
import warnings | ||
from collections.abc import Generator, Mapping, Sequence | ||
from types import MappingProxyType | ||
from typing import TYPE_CHECKING, Any | ||
from typing import TYPE_CHECKING, Any, Callable, TypeVar, Union | ||
|
||
import numpy as np | ||
import numpy.typing as npt | ||
import pandas as pd | ||
import spatialdata as sd | ||
import xarray as xr | ||
from anndata import AnnData | ||
from scanpy import logging as logg | ||
from skimage.measure import perimeter, regionprops | ||
from spatialdata import SpatialData | ||
|
||
from squidpy._constants._constants import ImageFeature | ||
from squidpy._docs import d, inject_docs | ||
from squidpy._utils import Signal, SigQueue, _get_n_cores, parallelize | ||
from squidpy.gr._utils import _save_data | ||
from squidpy.im import _measurements | ||
from squidpy.im._container import ImageContainer | ||
|
||
__all__ = ["calculate_image_features"] | ||
__all__ = ["calculate_image_features", "quantify_morphology"] | ||
|
||
IntegerNDArrayType = TypeVar("IntegerNDArrayType", bound=npt.NDArray[np.integer[Any]]) | ||
FloatNDArrayType = TypeVar("FloatNDArrayType", bound=npt.NDArray[np.floating[Any]]) | ||
RegionPropsCallableType = Callable[ | ||
[IntegerNDArrayType, FloatNDArrayType], Union[Union[int, float, list[Union[int, float]]]] | ||
] | ||
|
||
RegionPropsImageCallableType = Callable[[IntegerNDArrayType, FloatNDArrayType], dict[str, Union[int, float]]] | ||
|
||
|
||
def circularity(regionmask: IntegerNDArrayType) -> float: | ||
""" | ||
Calculate the circularity of the region. | ||
|
||
:param regionmask: Region properties object | ||
:return: circularity of the region | ||
""" | ||
perim = perimeter(regionmask) | ||
if perim == 0: | ||
return 0 | ||
area = np.sum(regionmask) | ||
return float((4 * np.pi * area) / (perim**2)) | ||
|
||
|
||
def _get_region_props( | ||
label_element: xr.DataArray, | ||
image_element: xr.DataArray, | ||
props: list[str] | None = None, | ||
extra_methods: list[RegionPropsCallableType] | None = None, | ||
) -> pd.DataFrame: | ||
if not extra_methods: | ||
extra_methods = [] | ||
if props is None: | ||
# if we didn't get any properties, we'll do the bare minimum | ||
props = ["label"] | ||
|
||
np_rgb_image = image_element.values.transpose(1, 2, 0) # (c, y, x) -> (y, x, c) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. like, stuff like this doesn't work with real data. Everything needs to be either lazy, or looped/parallelized |
||
|
||
# Add custom extra methods here | ||
rp_extra_methods = [ | ||
circularity, | ||
_measurements.granularity, # <--- Add additional measurements here that handle individual regions | ||
] + extra_methods | ||
|
||
image_extra_methods = [ | ||
_measurements.border_occupied_factor # <--- Add additional measurements here that calculate on the entire label image | ||
] # type: list[RegionPropsImageCallableType] | ||
image_extra_methods = {method.__name__: method for method in image_extra_methods} | ||
|
||
# can't use regionprops_table because it only returns int | ||
regions = regionprops( | ||
label_image=label_element.values, | ||
intensity_image=np_rgb_image, | ||
extra_properties=rp_extra_methods, | ||
) | ||
# dynamically extract specified properties and create a df | ||
extracted_props = {prop: [] for prop in props + [e.__name__ for e in extra_methods]} # type: dict[str, list[int | float]] | ||
for prop in props + [e.__name__ for e in extra_methods]: | ||
if prop in image_extra_methods: | ||
im_extra_result = image_extra_methods[prop](label_element.values, np_rgb_image) | ||
for region in regions: | ||
extracted_props[prop].append(im_extra_result.get(region.label)) | ||
continue | ||
|
||
for region in regions: | ||
try: | ||
extracted_props[prop].append(getattr(region, prop)) | ||
except AttributeError as e: | ||
raise ValueError(f"Property '{prop}' is not available in the region properties.") from e | ||
|
||
return pd.DataFrame(extracted_props) | ||
|
||
|
||
def _subset_image_using_label( | ||
label_element: xr.DataArray, image_element: xr.DataArray | ||
) -> Generator[tuple[int, xr.DataArray], None, None]: | ||
""" | ||
A generator that extracts subsets of the RGB image based on the bitmap. | ||
|
||
:param label_element: xarray.DataArray with cell identifiers | ||
:param image_element: xarray.DataArray with RGB image data | ||
:yield: Subsets of the RGB image corresponding to each cell in the bitmap | ||
""" | ||
unique_cells = np.unique(label_element.values) | ||
|
||
for cell_id in unique_cells: | ||
if cell_id == 0: | ||
# Assuming 0 is the background or non-cell area, skip it | ||
continue | ||
|
||
cell_mask = xr.DataArray( | ||
label_element.values == cell_id, | ||
dims=label_element.dims, | ||
coords=label_element.coords, | ||
) | ||
|
||
subset = image_element.where(cell_mask, drop=True) | ||
|
||
yield cell_id, subset | ||
|
||
|
||
@d.dedent | ||
|
@@ -94,7 +202,15 @@ def calculate_image_features( | |
n_jobs=n_jobs, | ||
backend=backend, | ||
show_progress_bar=show_progress_bar, | ||
)(adata, img, layer=layer, library_id=library_id, features=features, features_kwargs=features_kwargs, **kwargs) | ||
)( | ||
adata, | ||
img, | ||
layer=layer, | ||
library_id=library_id, | ||
features=features, | ||
features_kwargs=features_kwargs, | ||
**kwargs, | ||
) | ||
|
||
if copy: | ||
logg.info("Finish", time=start) | ||
|
@@ -103,6 +219,142 @@ def calculate_image_features( | |
_save_data(adata, attr="obsm", key=key_added, data=res, time=start) | ||
|
||
|
||
def quantify_morphology( | ||
sdata: SpatialData, | ||
label: str | None = None, | ||
image: str | None = None, | ||
methods: list[str | Callable] | None = None, | ||
split_by_channels: bool = False, | ||
**kwargs: Any, | ||
) -> pd.DataFrame | None: | ||
if label is None and image is None: | ||
raise ValueError("Either `label` or `image` must be provided.") | ||
|
||
if image is not None and label is None: | ||
raise ValueError("If `image` is provided, a `label` with matching segmentation borders must be provided.") | ||
|
||
if methods is None: | ||
# default case but without mutable argument as default value | ||
methods = ["label", "area", "eccentricity", "perimeter", "sphericity"] | ||
elif isinstance(methods, (str, Callable)): | ||
methods = [methods] | ||
|
||
if not isinstance(methods, list): | ||
raise ValueError("Argument `methods` must be a list of strings.") | ||
|
||
if not all(isinstance(method, (str, Callable)) for method in methods): | ||
raise ValueError("All elements in `methods` must be strings or callables.") | ||
|
||
if "label" not in methods: | ||
methods = ["label"].extend(methods) | ||
|
||
extra_methods = [] | ||
for method in methods: | ||
if callable(method): | ||
extra_methods.append(method) | ||
methods.remove(method) | ||
|
||
for element in [label, image]: | ||
if element is not None and element not in sdata: | ||
raise KeyError(f"Key `{element}` not found in `sdata`.") | ||
|
||
table_key = _get_table_key(sdata, label, kwargs) | ||
|
||
region_key = sdata[table_key].uns["spatialdata_attrs"]["region_key"] | ||
if not np.any(sdata[table_key].obs[region_key] == label): | ||
raise ValueError(f"Label '{label}' not found in region key ({region_key}) column of sdata table '{table_key}'") | ||
|
||
instance_key = sdata[table_key].uns["spatialdata_attrs"]["instance_key"] | ||
|
||
label_transform = sdata[label].transform | ||
image_transform = sdata[image].transform | ||
|
||
for transform in [label_transform, image_transform]: | ||
if len(transform) != 1: | ||
raise ValueError("More than one coordinate system detected") | ||
|
||
coord_sys_label = next(iter(label_transform)) | ||
coord_sys_image = next(iter(image_transform)) | ||
|
||
if coord_sys_label != coord_sys_image: | ||
raise ValueError(f"Coordinate system do not match! label: {coord_sys_label}, image: {coord_sys_image}") | ||
|
||
# from here on we should be certain that we have a label | ||
label_element = sdata.transform_element_to_coordinate_system(label, coord_sys_label) | ||
image_element = sdata.transform_element_to_coordinate_system(image, coord_sys_image) if image is not None else None | ||
|
||
region_props = _get_region_props( | ||
label_element, | ||
image_element, | ||
props=methods, | ||
extra_methods=extra_methods, | ||
) | ||
|
||
if split_by_channels: | ||
channels = image_element.c.values | ||
for col in region_props.columns: | ||
# did the method return a list of values? | ||
if isinstance(region_props[col].values[0], (list, tuple)): | ||
# are all lists of the length of the channel list? | ||
if all(len(val) == len(channels) for val in region_props[col].values): | ||
for i, channel in enumerate(channels): | ||
region_props[f"{col}_ch{channel}"] = [val[i] for val in region_props[col].values] | ||
region_props.drop(columns=[col], inplace=True) | ||
if isinstance(region_props[col].values[0], np.ndarray): | ||
shape = region_props[col].values[0].shape | ||
|
||
if not all(val.shape == shape for val in region_props[col].values): | ||
raise ValueError( | ||
f"The results of the morphology method {col} have different shapes, " f"this cannot be handled" | ||
) | ||
if not shape[1] == len(channels): | ||
raise ValueError( | ||
f"Number of channels {len(channels)} do not match " | ||
f"the shape of the returned numpy arrays {shape} of the morphology method {col}. " | ||
f"It is expected that shape[1] should be equal to number of channels." | ||
) | ||
|
||
for channel_idx, channel in enumerate(channels): | ||
for prop_idx in range(shape[0]): | ||
region_props[f"{col}_ch{channel_idx}_{prop_idx}"] = [ | ||
val[prop_idx, channel_idx] for val in region_props[col].values | ||
] | ||
region_props.drop(columns=[col], inplace=True) | ||
|
||
region_props.rename(columns={"label": instance_key}, inplace=True) | ||
region_props[region_key] = label | ||
region_props.set_index([region_key, instance_key], inplace=True) | ||
|
||
results = sdata[table_key].obs[[region_key, instance_key]] | ||
results = results.join(region_props, how="left", on=[region_key, instance_key]) | ||
|
||
# region_props = region_props.set_index("label", drop=True) | ||
# region_props.index.name = None | ||
# region_props.index = region_props.index.astype(str) | ||
|
||
sdata[table_key].obsm["morphology"] = results | ||
|
||
return region_props | ||
|
||
|
||
def _get_table_key(sdata: sd.SpatialData, label: str, kwargs: dict[str, typing.Any]) -> str: | ||
table_key = kwargs.get("table_key", None) | ||
if table_key is None: | ||
tables = sd.get_element_annotators(sdata, label) | ||
if len(tables) > 1: | ||
raise ValueError( | ||
f"Multiple tables detected in `sdata` for {label}, " | ||
f"please specify a specific table with the `table_key` parameter" | ||
) | ||
if len(tables) == 0: | ||
raise ValueError( | ||
f"No tables automatically detected in `sdata` for {label}, " | ||
f"please specify a specific table with the `table_key` parameter" | ||
) | ||
table_key = next(iter(tables)) | ||
return table_key | ||
|
||
|
||
def _calculate_image_features_helper( | ||
obs_ids: Sequence[str], | ||
adata: AnnData, | ||
|
@@ -116,7 +368,12 @@ def _calculate_image_features_helper( | |
) -> pd.DataFrame: | ||
features_list = [] | ||
for crop in img.generate_spot_crops( | ||
adata, obs_names=obs_ids, library_id=library_id, return_obs=False, as_array=False, **kwargs | ||
adata, | ||
obs_names=obs_ids, | ||
library_id=library_id, | ||
return_obs=False, | ||
as_array=False, | ||
**kwargs, | ||
): | ||
if TYPE_CHECKING: | ||
assert isinstance(crop, ImageContainer) | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @npeschke @timtreis was just taking a look at this, was wondering why the duplication of the function? It seems that
quantify_morphology
does something very similar tocalculate_image_features
. Do you plan to discontinue the latter? wouldn't it be better to adapt the latter to use spatialdata + adapt? thank youThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh not sure yet what the optimal solution will be. There is clear redundancy and a strong overlap, but when I wrote the original MVP, it didn't clearly fit with the current
calculate_image_features
. On the one hand, some of these features don't need an image but only the label, but then also the structure of the function was going to be quite different.Even if we merge them into the calculate_image_features function, the parameters of that function would have to change. So yeah, not fully sure yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally wanted to have sth that takes an arbitrary callable with a region_props like footprint so we could also inject f.e. some HugginfFace featuriser or sth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that's a good point. The
calculate_image_features
does support arbitrary functions, see https://squidpy.readthedocs.io/en/stable/api/squidpy.im.calculate_image_features.html and https://squidpy.readthedocs.io/en/stable/api/squidpy.im.calculate_image_features.html and I understand that the current implementation doesn't operate on masks (or not only on mask), but maybe it would be ok to just change the input to "image" to not be just an Image but also a Label?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parallelization/out of core functionality I think it's also important. The current implementation relies on joblib parallel to iterate over spots, and understand it's not optimal (e.g. how to do it with raster labels and raster image?). But maybe a combination of that and dask, e.g. https://examples.dask.org/applications/image-processing.html could be useful. Basically I think we should strive to implement scalability in time/memory.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this, not maintained but maybe some ideas could be reused
https://github.com/jrussell25/dask-regionprops/blob/main/dask_regionprops/regionprops.py
EDIT: looking more deeply, not sure anymore 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this ties into the larger topic of moving things to the GPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for context, the current implementation takes 20 seconds on my machine to calculate all available features of the MIBI-TOF dataset. So parallelization might not need to be that urgent.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be if we add more, potentially more expensive to compute, features and analyse datasets like Xenium with 100k+ cells ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @npeschke , thanks for sharing the time. If the mibi-tof dataset you are referring to is the one from squidpy, that is a toy dataset that does not really recapitulate real data complexity and size. I'd be curious to see the performance on a e.g. xenium dataset.