Skip to content

Dataset & Preprocessing

Sheshank Singh edited this page Nov 3, 2025 · 1 revision

Dataset and Preprocessing

Datasets Used

The model uses two publicly available chest X-ray datasets from the NIH Tuberculosis CXR Collection:

  1. Shenzhen Hospital CXR Set (China)

    • Collected by Shenzhen No. 3 People’s Hospital.
    • Includes normal and TB-positive X-rays with manually segmented masks.
  2. Montgomery County CXR Set (USA)

    • Collected by the Department of Health and Human Services, Montgomery County.
    • Contains TB-affected and healthy lungs with expert-labeled left and right masks.

These datasets are standard benchmarks for TB detection and segmentation tasks.

Preprocessing Steps

  • Grayscale conversion
  • Resizing to 256×256
  • Normalization of pixel intensity values (0–1 range)

Augmentation

To improve generalization and robustness:

A.Compose([
    A.ShiftScaleRotate(shift_limit=0.05, scale_limit=0.1, rotate_limit=10, p=0.8),
    A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5)
])

Clone this wiki locally