This repository showcases a partial code of preprocessing Whole Slide Images (WSIs), such as those found in the TCGA and PANDA datasets. It is designed to bridge the gap between massive raw clinical data and AI-ready inputs.
- Intelligent Tissue Detection: Implements a robust contour detection algorithm using OpenCV to automatically separate histological tissue from background noise.
- Multi-Resolution Processing: Handles hierarchical TIFF and Zarr structures, allowing for efficient downsampling and high-resolution patch extraction.
- Heuristic-Based Patch Ranking: Features a scoring mechanism (
rank_key) that prioritizes patches based on tissue coverage and edge density, optimizing data quality for downstream models.
- Languages: Python
- Core Libraries: OpenCV, NumPy, TiffFile, Zarr, PIL
- Domain Focus: Computational Pathology, Cancer Genomics (TCGA)
While my professional UI development at Amazon is proprietary, this project demonstrates my ability to architect the backend data logic and pruning strategies required for the cBioPortal MCP server. It proves I can handle high-dimensional genomic datasets and translate them into optimized formats for visualization.