Skip to content

Commit 836075b

Browse files
Add Medical_Appointment_No_Shows: ML model
1 parent 7b46b6e commit 836075b

File tree

10 files changed

+114387
-0
lines changed

10 files changed

+114387
-0
lines changed

projects/prediction/Medical_Appointment_No_Shows/data/KaggleV2-May-2016.csv

Lines changed: 110528 additions & 0 deletions
Large diffs are not rendered by default.

projects/prediction/Medical_Appointment_No_Shows/git.ignore

Whitespace-only changes.

projects/prediction/Medical_Appointment_No_Shows/notebooks/main.ipynb

Lines changed: 3750 additions & 0 deletions
Large diffs are not rendered by default.
24.1 KB
Loading
74.9 KB
Loading
26.6 KB
Loading
22 KB
Loading
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Medical Appointment No-Show Prediction
2+
3+
## Description
4+
This project aims to predict patient no-shows for medical appointments using machine learning. No-shows cause inefficiencies and lost resources in healthcare; predicting these helps optimize scheduling and patient outreach.
5+
6+
We explore several machine learning models, including Logistic Regression, Decision Trees, Random Forest, XGBoost, and Artificial Neural Networks (ANN). We perform data cleaning, feature engineering, model training, hyperparameter tuning, and evaluation.
7+
8+
## Project Structure
9+
```
10+
Medical_Appointment_No_Shows/
11+
├── data/ # Dataset files (KaggleV2-May-2016.csv)
12+
├── notebooks/ # Jupyter notebooks (main.ipynb)
13+
├── src/ # Source code (Python modules)
14+
├── models/ # Saved models
15+
├── pictures/ # Visualizations and plots
16+
├── research/ # Research materials
17+
├── requirements.txt # Dependencies
18+
└── README.md # Project documentation
19+
```
20+
21+
## Dataset
22+
**Source:** Kaggle Medical Appointment No-Shows Dataset
23+
24+
The dataset contains patient appointment information, including:
25+
- **Patient demographics:** Age, scholarship status, health conditions (hypertension, diabetes, alcoholism, handicap)
26+
- **Appointment information:** Scheduled day, appointment day, days difference
27+
- **SMS reminders:** Whether patient received SMS reminder
28+
- **Target variable:** No-show labels (0 = showed up, 1 = no-show)
29+
30+
**Key Challenges:**
31+
- Imbalanced dataset with fewer no-shows than shows
32+
- Time features needing careful processing (handling time components in dates)
33+
- Feature selection and encoding to improve model learning
34+
35+
## Installation
36+
```bash
37+
# Clone the repository
38+
git clone <your-repo-url>
39+
cd Medical_Appointment_No_Shows
40+
41+
# Install dependencies
42+
pip install -r requirements.txt
43+
```
44+
45+
## Usage
46+
1. **Data Exploration:** Open `notebooks/main.ipynb` to explore the dataset
47+
2. **Data Preprocessing:** The notebook includes data cleaning and feature engineering
48+
3. **Model Training:** Train models using the provided code in the notebook
49+
4. **Evaluation:** Review model performance metrics and visualizations
50+
51+
```python
52+
# Example: Load and explore data
53+
import pandas as pd
54+
df = pd.read_csv('data/KaggleV2-May-2016.csv')
55+
df.head()
56+
```
57+
58+
## Model Details
59+
### Algorithms Used:
60+
- **Logistic Regression:** Baseline linear model
61+
- **Decision Tree:** Non-linear decision boundary model
62+
- **Random Forest:** Ensemble of decision trees with tuning
63+
- **Support Vector Machine (SVM):** Kernel-based classifier
64+
- **XGBoost:** Gradient boosting with class balancing
65+
66+
### Performance:
67+
- **Baseline Accuracy:** ~77% (but poor recall on no-shows)
68+
- **Tuned Random Forest F1 Score:** ~0.44
69+
- **XGBoost Recall:** ~79% (improved no-show detection)
70+
71+
### Training Details:
72+
1. Data Cleaning and Feature Engineering
73+
2. Exploratory Data Analysis with visualization (histograms, bar plots, correlation heatmaps)
74+
3. Baseline model comparisons (Logistic Regression, Decision Tree, Random Forest, SVM)
75+
4. Hyperparameter tuning using GridSearchCV with custom scoring (F1)
76+
5. Evaluation with classification metrics focusing on F1 score and recall
77+
6. Advanced models: XGBoost with balanced class weights
78+
79+
## Results
80+
- Baseline models achieved up to ~77% accuracy but poor recall on no-shows
81+
- Hyperparameter tuning improved F1 score significantly (~0.44), emphasizing recall
82+
- XGBoost model achieved ~79% recall for no-show detection
83+
- Final tuned models balanced recall and precision for effective no-show detection
84+
- Feature importance analysis revealed key predictors (SMS received, days difference, age)
85+
86+
**Visualizations:** See `pictures/` folder for feature importance plots, confusion matrices, and other visualizations.
87+
88+
## Contributing
89+
Contributions, suggestions, and feedback welcome!
90+
91+
**How to contribute:**
92+
1. Fork the repository
93+
2. Create a feature branch (`git checkout -b feature/improvement`)
94+
3. Commit your changes (`git commit -am 'Add new feature'`)
95+
4. Push to the branch (`git push origin feature/improvement`)
96+
5. Open a Pull Request
97+
98+
## License
99+
MIT License
100+
101+
---
102+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pandas>=1.5.0
2+
numpy>=1.23.0
3+
matplotlib>=3.6.0
4+
seaborn>=0.12.0
5+
scikit-learn>=1.2.0
6+
xgboost>=1.7.0
7+
jupyter>=1.0.0

0 commit comments

Comments
 (0)