Skip to content

Conversation

@EricSchrock
Copy link
Collaborator

@EricSchrock EricSchrock commented Jan 10, 2026

Summary

In preparation for the PyHealth 2.0 release, this PR cleans up a few loose ends in the ChestXray14Dataset dataset.

Changes

  • Update ChestXray14Dataset to download the labels and metadata CSV file directly from the NIH Box folder. When I originally implemented this dataset in ChestX-ray14 Dataset and Classification Tasks #392, I couldn't figure out how to download that file from Box automatically, so I mirrored it to my personal Google Drive. This PR fixes that.
  • Tidy up the ChestXray14Dataset API docs.
  • Convert the ChestXray14Dataset examples from Jupyter notebooks to Python scripts. I also moved the examples to examples/cxr/ to avoid merge conflicts with Add/cxr comprehensive tutorial and benchmarks for PyHealth paper #773.
  • Update the ChestXray14Dataset unit tests to use the test-resources/core/chestxray14 directory instead of creating and deleting a temporary test directory. This matches the pattern in the unit tests for other datasets. Test images are still generated at test time to avoid adding large files to the repo.
  • Fix a PIL deprecation warning (Image.fromarray() will deprecate the mode parameter soon).

Testing

  • All unit tests passing
  • Both examples run without errors

… like the other unit tests (still generate fake images to avoid adding large files to the repo history)
…e directly from the NIH box share instead of from a mirror in a personal Google Drive
…e image processor is needed for the chestxray14 dataset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant