feat(examples): add segmentation & feature extraction examples#33
feat(examples): add segmentation & feature extraction examples#33alxndrkalinin merged 36 commits intomainfrom
Conversation
Add notebook reproducing CellProfiler 3D monolayer segmentation (BBBC034v1, Thirstrup et al. 2018) using cubic. Includes nuclei and cell segmentation pipelines with AP evaluation against CellProfiler reference labels. Data auto-downloaded via pooch from GitHub release v0.7.0a1 assets. Results: 25 nuclei (mAP 0.433), 24 cells (mAP 0.282). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tation Cross-referenced CellProfiler source code to identify and fix 4 discrepancies that improve segmentation quality: 1. Median filter: use cubic window (scipy median_filter size=5) instead of spherical ball(5) footprint — matches CP's MedianFilter module 2. Monolayer closing: use plane-by-plane disk(17) instead of 3D ball(17) — matches CP's Closing module which applies 2D structuring elements per-slice 3. Multi-Otsu nbins: pass nbins=128 to match CellProfiler's default 4. Seed creation: erode downsized nuclei (ball(5) at 0.5x) with vanished-object protection — matches CP's ErodeObjects module Results improved from baseline: - Nuclei: 29 objects, mAP 0.468 (was 0.433) - Cells: 25 objects, mAP 0.490 (was 0.282) Discrepancies evaluated but not applied (hurt metrics): - Cell watershed -EDT landscape (cells mAP -0.023) - Cube footprint for nuclei watershed (nuc mAP -0.098) - Nearest-neighbor downscale (hurt combined metrics) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lProfiler Switch cell watershed from membrane intensity landscape to negated distance transform of binary cell mask, matching CellProfiler's shape-based declumping. This fix was re-evaluated on top of the 4 previous CP-alignment changes and now improves cells mAP (+0.016) without affecting nuclei. Results: nuclei mAP 0.468, cells mAP 0.506 (was 0.490). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ummary Replace single AP plot with side-by-side comparison showing old (cubic_paper) vs new AP curves for both nuclei and cells. Update summary table with concrete numbers for CellProfiler, old pipeline, and new pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rsegmentation Add binary_dilation(seeds, ball(1)) before labeling in nuclei watershed, matching CellProfiler's Watershed module which dilates seeds to merge nearby peaks before assigning labels. This reduces nuclei oversegmentation (29 -> 27, closer to CP's 25) while significantly improving mAP: - Nuclei: 27 objects, mAP 0.561 (was 29 objects, mAP 0.468) - Cells: 23 objects, mAP 0.558 (was 25 objects, mAP 0.506) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ching CP
CellProfiler's MedianFilter passes mode="constant" (zero-padding) to
scipy.ndimage.median_filter, while we were using the default "reflect".
Zero-padding darkens border pixels, making them fall below the Otsu
threshold and producing a cleaner binary mask with fewer edge artifacts.
Results: nuclei 26 (was 27), mAP 0.698 (was 0.561).
cells 22, mAP 0.579 (was 0.558).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s per object The old ListedColormap approach mapped labels 1-26 to the first ~10% of tab20's range, causing most labels to share the same 2-3 colors. Replace with a label_cmap() function that maps each label to a distinct tab20 color using modular arithmetic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mbrane mask CellProfiler's RemoveHoles(size=20) interprets 20 as diameter, converting to sphere volume: pi * (4/3) * 10^3 = 4189 voxels. We were using area_threshold=20 (only 20 voxels), leaving membrane fragments inside cell interiors unfilled. This produced a noisier cell mask with more internal gaps, leading to fragmented watershed results. Results: cells mAP 0.588 (was 0.579), nuclei unchanged at 0.698/26. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…isualization rescale_xy only downscales XY (Z unchanged), so the downscaled binary and watershed images should use z_mid (not z_mid//2) for the mid-slice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eferences - Remove old/new AP comparison, replace with single clean AP curve plot - Merge nuclei and cells slice comparisons into one 4-row figure - Remove cubic_paper/old pipeline references from summary table - Simplify summary to compare only cubic vs CellProfiler Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused ListedColormap import - Replace direct skimage imports (peak_local_max, watershed, resize) with cubic.skimage equivalents (feature, segmentation, transform) - Remove duplicate import of resize in cell 8 - All skimage functions now imported through cubic's device-agnostic proxy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After watershed + upscaling + cleanup, some nuclei have small internal holes (up to 340 voxels) visible as gaps in the XY mid-slice. Add per-nucleus remove_small_holes(area_threshold=500) to fill these. Nuclei mAP: 0.709 (was 0.698), AP@IoU=1.0: 0.378 (was 0.308). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove standalone AP plot cell and add AP curves as 4th column in the combined nuclei/cells comparison figure. Each AP curve sits next to its corresponding XY/XZ mask comparisons. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ppers - Replace scipy.ndimage.median_filter with cubic.scipy.ndimage.median_filter - Replace scipy.ndimage.distance_transform_edt with cubic.image_utils.distance_transform_edt - No direct scipy imports remain — all go through cubic's device-agnostic proxies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… calls Let cubic's device-agnostic proxies handle CPU/GPU routing instead of manually converting arrays. Removed: - asnumpy + to_device around ndimage.median_filter (proxy handles it) - asnumpy(nuc_binary) before distance_transform_edt/peak_local_max - to_device after segmentation.watershed (proxy returns on correct device) - asnumpy(nuclei) before transform.resize (proxy handles it) - asnumpy(cell_mask/seeds) before watershed (proxy handles it) - asnumpy in hole-fill loop (boolean indexing works on both devices) Kept: asnumpy(coords) for np.zeros seed creation (needs CPU fancy indexing), asnumpy in planewise closing loop (np.zeros_like needs CPU), asnumpy for matplotlib visualization, asnumpy(old_labels) for Python enumerate loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…portability Reframe the notebook intro to highlight cubic's device-agnostic API as the main point, with the CellProfiler reproduction as the demonstration use case. Add links to the CellProfiler tutorial website and GitHub repository. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ootprints - watershed is not in cucim — explicitly asnumpy inputs before calling segmentation.watershed, then to_device the result back - morphology.ball() returns CPU arrays — use to_same_device for footprints passed to cucim morphology operations (binary_dilation, peak_local_max) - Add get_device import for preserving device across CPU fallbacks Verified: runs on both CPU and GPU with identical results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rence The tutorial data (60x256x256, 3 channels: membrane/mito/DNA) does not match BBBC034 (1024x1024x52, 4 channels: CellMask/GFP/DNA/brightfield). The data is from the Allen Institute for Cell Science, provided with the CellProfiler 3D monolayer tutorial. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Sorry @alxndrkalinin, your pull request is larger than the review limit of 150000 diff characters
There was a problem hiding this comment.
Pull request overview
Adds documentation entries for a new example notebook that reproduces CellProfiler’s 3D monolayer segmentation tutorial using cubic, including dataset notes and a link from the main examples table.
Changes:
- Documented the 3D monolayer dataset files and provenance in
examples/data/README.md - Added the new 3D monolayer segmentation notebook to the main
README.mdexamples table
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| examples/data/README.md | Adds a new dataset section describing the 3D monolayer TIFF files and their source/download location |
| README.md | Adds the new segmentation notebook link to the examples table |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…comment - Move perf_counter and get_device imports to cell 1 with other imports - Add comment explaining why planewise closing requires CPU roundtrip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark with proper warmup: nuclei 0.2s GPU vs 5.4s CPU (30x speedup), cells ~3.2s on both (watershed/closing CPU-bound), total 3.3s vs 9.0s (2.7x). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Watershed is not in cuCIM; planewise closing runs on CPU by design to match CellProfiler's per-slice 2D behavior, not because of a fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep mono_ds on GPU and call morphology.closing per Z-slice with a GPU disk footprint via to_same_device, instead of asnumpy → CPU loop → to_device. Cell segmentation: 1.86s (was 3.49s on GPU, ~1.9x faster). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add dilate_seeds option that dilates seed points with ball(1) before labeling, matching CellProfiler's watershed seed dilation behavior. Also fix GPU compatibility: use to_same_device for footprints and asnumpy for watershed inputs (not in cucim). Simplifies notebook nuclei watershed from 12 lines of inline code to: segment_watershed(nuc_binary, ball_size=10, dilate_seeds=True) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… rescale_xy
- Add filter_mode parameter for boundary handling (default "nearest");
when filter_shape="square", uses scipy.ndimage.median_filter with the
specified mode instead of skimage.filters.median
- Fix: use rescale_xy instead of transform.rescale to only downscale XY
(preserving Z dimension for 3D images)
- Notebook nuclei step 2 simplified to:
downscale_and_filter(dna_norm, filter_size=5, filter_mode="constant")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allow choosing between XY-only downscaling (default, preserves Z) and uniform downscaling of all dimensions (downscale_xy_only=False). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… pipeline When markers and mask are both provided, segment_watershed now computes the negated EDT of the mask as the watershed landscape (shape-based partitioning), matching CellProfiler's declumping behavior. Notebook cell pipeline simplified from 6 lines of inline EDT+watershed to: segment_watershed(cell_mask, markers=seeds, mask=cell_mask) Also removed unused cubic.skimage.segmentation import from notebook. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ze param Replace manual per-label hole-fill loop with the existing max_hole_size parameter of cleanup_segmentation: cleanup_segmentation(nuclei, min_obj_size=50, max_hole_size=500) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cleanup_segmentation calls label() which splits disconnected pieces of the same cell into separate objects (22 -> 29 cells, mAP 0.588 -> 0.465). The manual relabeling preserves watershed label identity across disconnected 3D pieces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rkflow - Add examples/notebooks/feature_extraction_3d.ipynb: GPU-accelerated regionprops_table on cells3d (18 objects, 136 features) - Add examples/scripts/generated CI workflow: auto-converts notebooks to scripts on push to main - Add [examples] optional dependency group (jupyter, pandas, pooch) - Fix cubic.feature.voxel: convert spacing list to tuple for cucim compat - Remove redundant examples/scripts/regionprops_example.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace pandas DataFrame usage in feature_extraction_3d notebook with plain numpy dict + formatted print. Remove pandas from [examples] extra. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 9 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Code review fixes: - Fix regionprops() missing spacing tuple conversion for cucim compat - Remove unnecessary asnumpy(distance).shape — .shape works on both devices Copilot feedback fixes: - Make downscale_xy_only/filter_mode keyword-only in downscale_and_filter to preserve positional arg compatibility - Make mask/dilate_seeds keyword-only in segment_watershed, keep ball_size as 3rd positional arg for backward compat - Replace assert with ValueError for filter_shape validation - Fix numpy scalar formatting in feature_extraction_3d notebook (np.issubdtype check instead of isinstance(x, float)) - Use uv sync + uv run in notebooks CI workflow (matches lint-format.yml) - Remove dangling [tool.uv.sources] iohub entry from pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cleanup_segmentation was casting label() output to uint8, silently truncating labels >255 to 0. Now returns the native int32/int64 dtype from label(). Callers that need a specific dtype already cast explicitly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
844a117 to
1db8308
Compare
…leanup_segmentation cleanup_segmentation now returns uint16 natively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
cubic's device-agnostic APIregionprops_tablesegment_utils.pywith CellProfiler-aligned parameters (dilate_seeds,mask,filter_mode,downscale_xy_only)cleanup_segmentationuint8 truncation (was silently corrupting labels >255)regionprops/regionprops_table(list→tuple conversion)The same code runs on both CPU and GPU without modification.
Segmentation results
CellProfiler alignment
Pipeline parameters cross-referenced against CellProfiler's source code:
mode="constant"(CP'sMedianFilter)ball(1)(CP'sWatershed)disk(17)closing (CP'sClosing— 2D per Z-slice)nbins=128(CP default)RemoveHolesdiameter-to-volume conversion (size=20->area_threshold=4189)ErodeObjects)Files changed
examples/notebooks/segmentation_3d_monolayer.ipynbexamples/notebooks/feature_extraction_3d.ipynbcubic/segmentation/segment_utils.pysegment_watershed(dilate_seeds, mask),downscale_and_filter(filter_mode, downscale_xy_only),cleanup_segmentation(uint8->uint16)cubic/feature/voxel.py.github/workflows/notebooks.ymlpyproject.toml[examples]extra (jupyter, pooch)README.md,examples/data/README.mdData
5 files uploaded as GitHub release assets on v0.7.0a1 (3 raw channels + 2 CellProfiler reference labels), auto-downloaded via
poochon first run.Test plan
ruff checkandruff format --checkpasscubic.*wrappers🤖 Generated with Claude Code