Accurate instrument segmentation in endoscopic imagery is vital for automated surgical systems, assisting in tool tracking, scene understanding, and minimizing invasiveness. This project explores a custom implementation of DeepLabv3+ with a ResNet50 backbone, enhanced with handcrafted convolution and dilated spatial pyramid pooling layers. It focuses on robust polyp tool segmentation from the Kvasir-Instrument dataset using advanced preprocessing and evaluation strategies.
- Develop an efficient DeepLabv3+ architecture with custom layers.
- Improve segmentation robustness through preprocessing techniques like histogram equalization and dilation.
- Evaluate performance using Dice Coefficient and Jaccard Index.
- Visualize qualitative performance under varying prediction thresholds.
- Does preprocessing improve segmentation quality in medical instrument images?
- How effective is the custom DSPP module for fine-grained instrument boundary detection?
- Can we reach near real-time inference without compromising accuracy?
3. Dataset
- Source: Kvasir-Instrument Dataset
- Inputs: RGB endoscopy images (512x512)
- Masks: Binary instrument masks
- Split: 531 training, 80 validation, 59 test images
- Images: Histogram equalization + Gaussian smoothing (kernel = 4x4)
- Masks: Binary conversion and dilation (kernel = 2x2)
- Libraries: OpenCV, NumPy
- ConvBlock: Modular convolutional layer with optional batch normalization and activation.
- DilatedSpatialPyramidPooling: Captures multi-scale context using dilated convolutions and global pooling.
- Encoder: ResNet50 pretrained on ImageNet.
- ASPP: Custom DSPP module with dilation rates [1, 6, 12, 18].
- Decoder: Concatenation with low-level features + ConvBlocks + UpSampling2D
- Output: 1-channel sigmoid for binary segmentation
- Input Shape: (512, 512, 3)
- Loss: Binary Crossentropy
- Metrics: Dice Loss, Dice Coefficient, Jaccard Index
- Optimizer: Adam
- Epochs: 25
- Batch Size: 8
- Learning Rate: Default Adam
- Callback Strategy: No early stopping, manual validation
| Epoch | Val Dice | Val Jaccard |
|---|---|---|
| 15 | 0.8672 | 0.7685 |
| 21 | 0.9725 | 0.9466 |
| 25 | 0.9731 | 0.9477 |
| Split | Dice | Jaccard | Loss |
|---|---|---|---|
| Train | 0.9781 | 0.9571 | 0.0071 |
| Validation | 0.9761 | 0.9534 | 0.0079 |
| Test | 0.8649 | 0.7761 | 0.1596 |
- Evaluated predictions at two thresholds: 0.3 and 0.99
- Observed robustness in boundary detection across tools of varying shape and brightness
This project demonstrates the utility of combining handcrafted preprocessing with custom deep semantic segmentation models. It highlights:
- The effectiveness of histogram equalization + dilation for medical imagery
- Custom DSPP’s ability to capture fine spatial detail
- Competitive segmentation scores with minimal overfitting
- Apply post-processing (e.g., CRF, morphological refinements)
- Explore multi-class segmentation across different tools
- Deploy model in real-time surgical assistance setups


