This project has been influenced by a healthcare_data set that I located on kaggle.com, https://www.kaggle.com/datasets/prasad22/healthcare-dataset. And my time spent as Data Engineer at a consultancy company in the UK.
This dataset that's produced by the code is free to use and has a public license attached. Code can be downloaded and modified by anyone.
-
Whilst the dataset considers females and males for different conditions, such as Acne, where females can suffer from this condition later on in life, during perimenipasual, menopausal and postmenopoasual stages. This oversight suggests a need for further research for other conditions to address gender bias, which is often overlooked in large datasets, especially when constructing synthetic data.
-
The dataset also examines and distributes the ethnicity of the UK population across the defined sample size. However, it fails to account for how certain ethnicities may be more susceptible to specific conditions than others. Expanding its functionality to include this aspect should be considered where possible.
-
It is necessary to investigate medications and long-term treatment plans for each condition, as they are currently allocated arbitrarily without thoughtful consideration.
-
There is a need for more granularity in disease grading, for example, detailing different cancer stages such as Stage 1, 2, 3, and 4.