Pneumonia-identification-from-X-Ray-images

Authors : Ana Solbas Casajús and Natalia García Sánchez

Final code project for the Big Data Engineering course in the Masters in Computational Biology (UPM), with the purpose of training several Spark-based image classification models for predicting Pneumonia from patients in a Chest X-Ray Image dataset. In addition, the following repository will check if the candidate model is scalable using a high level python interface based on BigDL-DLlib model employment.

Example of pneumonia, retrieved from Kermany et. al, 2018

The code of this project was initially designed to run in Google Colab. However, if a simple python interface is used, you can convert the Jupyter Notebook into a python file with the nbconverter python package. The dependencies needed to run the code in Pneumonia_Identification_Big_Data_Final_Project.ipynb are available in requirements.txt, and can be installed in the following way.

pip install -r requirements.txt

To import the prerelease version of BigDL-DLlib with spark3, you can execute the following line of code in colab,

!pip -qq install bigdl-spark3

Or use this line of code

!pip install https://sourceforge.net/projects/analytics-zoo/files/dllib-py-spark3/bigdl_dllib_spark3-0.14.0b20211107-py3-none-manylinux1_x86_64.whl

Dataset

The code already automates the task of downloading the images into the code working directory, but a fraction of this data leveragable for training can stil be found in folders for the repository.The image dataset is composed of three folders train, test and val, each of them having two folders relating to images from normal patients (present in the nested NORMAL folders) and patients undergoing pneumonia (present in the nested PNEUMONIA folders)

Version	Date	License	Dataset Folders	Citation	Source	Acquired from
v.2.0	06/01/2018	CC BY 4.0	(`test`,`train`,`val`)	Kermany, D. S., Goldbaum, M., Cai, W., Valentim, C. C. S., Liang, H., Baxter, S. L., McKeown, A., Yang, G., Wu, X., Yan, F., Dong, J., Prasadha, M. K., Pei, J., Ting, M. Y. L., Zhu, J., Li, C., Hewett, S., Dong, J., Ziyar, I., … Zhang, K. (2018). Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 172(5), 1122-1131.e9. https://doi.org/10.1016/j.cell.2018.02.010	Mendeley Data	Kaggle

Code

The code includes the image embedding preprocessing stages, and the training and evaluation of ML models in the pipeline

Version	Date	Script	Description
v.3	6/02/2023	`Pneumonia_Identification_Big_Data_Final_Project.ipynb`	Pneumonia identification (All preprocessing, model training and evaluation stages)

The requirements for the execution of the code are present in requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
test		test
train		train
val		val
LICENSE		LICENSE
Pneumonia_Identification_Big_Data_Final_Project.ipynb		Pneumonia_Identification_Big_Data_Final_Project.ipynb
README.md		README.md
picture.jpeg		picture.jpeg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pneumonia-identification-from-X-Ray-images

Authors : Ana Solbas Casajús and Natalia García Sánchez

Dataset

Code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pneumonia-identification-from-X-Ray-images

Authors : Ana Solbas Casajús and Natalia García Sánchez

Dataset

Code

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages