In this machine learning project, we want to forecast the diagnostic group of patients from their intrusive memories characteristics. The two diagnostic groups are Post-Traumatic Stress Disorder (PTSD) and Cocaine Use Disorder (CUD). In the raw dataset we have data of 1001 surveys and over 600 features. This is a classification problem.
The project is done under the supervision of Dr. Lina Dietker at the Experimentelle Psychopathologie und Psychotherapie laboratory at the University of Zurich.
The project consists of the following files:
data_cleaning.py: generates the cleaned datasetfinal_data.csv.helper.py: contains helper functions.methods.py: contains the tuning as well as performance assessment of each methods except for multilayer perceptron.neural_networks.py: contains the tuning as well as performance assessment of multilayer perceptron.ablation.py: generates predictions using logistic regression on both the dataset without outliers and with feature augmentation to assess performance.plots.py: generates data visualization plots.ethics.py: generates plots of age, gender, origins and years of education.run.py: generates predictions using the best model.
The folders plots and ethics contain data visualization .png files for the report.
- Clone github repository.
- Download
EMemory_data.csvfrom this site and store it in a folder calleddata. - Create a folder
Datasets. - Run the
data_cleaning.pyfile to generatefinal_data.csv. - Run the
run.pyfile to generate our predictions with the best model.
If an error occurs (e.g. "function ... not defined"), run the helper.py file.
Note that the Data_Patients.csv file is confidential and could not be shared. The file ethics.py cannot be run without it, however the plots it generates are in the folder plots/ethics.
Please note that the link to download the EMemory_data.csv file will expire on February 3, 2024.
The required libraries that must be installed are listed in the requirements.txt file.
© 2023 GitHub, Inc.