- Course: ECE-UY 4563 Introduction to Machine Learning
- Institution: New York University
- Semester: Fall 2024
- Instructor: Prof. Sundeep Rangan
This project focuses on predicting cancer (specifically head and neck cancer) patient outcomes through binary survival classification using various machine learning approaches. We implemented and compared the performance of three different models:
- Logistic Regression
- Support Vector Machine (SVM)
- Neural Network
- Hongtai Du (hd2609@nyu.edu)
- Daniel Yang (ty2184@nyu.edu)
The dataset used in this project was obtained from The Surveillance, Epidemiology, and End Results (SEER) Program. Due to data privacy and usage agreements, the sample dataset cannot be uploaded on GitHub. Here are the insgtructions we get data samples:
- Search for SEER database in browser to register for an account and sumbit request access to SEER Data
- Check email for the download link, and install the software
- Click on FIle -> New -> Case Listing Session
- In "Data", select "Incidence - SEER Research Data, 17 Registries, Nov 2023 Sub (2000-2021)". Then move to "Selection".
- Choose following variables: Age recode with <1 year olds, Sex, Year of diagnosis, Race recode (W, B, AI, API), Primary Site, Histologic Type ICD-O-3, Combined Summary Stage (2004+), Surgery of oth reg/dis sites (1998-2002), Chemotherapy recode (yes, no/unk), Radiation recode, Vital status recode (study cutoff used), Survival months, SEER cause-specific death classification, Tumor Size Summary (2016+), EOD Primary Tumor (2018+), CS lymph nodes (2004-2015), CS tumor size (2004-2015)
- Execute file
- Export result matrix to csv or txt file
We conducted a comparative analysis of three machine learning models:
- Logistic Regression - A baseline model for binary classification
- Support Vector Machine - For potentially better handling of non-linear relationships
- Neural Network - To capture complex patterns in the data
The performance metrics and detailed analysis of each model can be found in the project documentation.
Note: This project is part of the academic curriculum at NYU Tandon School of Engineering.