|
| 1 | +# Medical Appointment No-Show Prediction |
| 2 | + |
| 3 | +## Description |
| 4 | +This project aims to predict patient no-shows for medical appointments using machine learning. No-shows cause inefficiencies and lost resources in healthcare; predicting these helps optimize scheduling and patient outreach. |
| 5 | + |
| 6 | +We explore several machine learning models, including Logistic Regression, Decision Trees, Random Forest, XGBoost, and Artificial Neural Networks (ANN). We perform data cleaning, feature engineering, model training, hyperparameter tuning, and evaluation. |
| 7 | + |
| 8 | +## Project Structure |
| 9 | +``` |
| 10 | +Medical_Appointment_No_Shows/ |
| 11 | +├── data/ # Dataset files (KaggleV2-May-2016.csv) |
| 12 | +├── notebooks/ # Jupyter notebooks (main.ipynb) |
| 13 | +├── src/ # Source code (Python modules) |
| 14 | +├── models/ # Saved models |
| 15 | +├── pictures/ # Visualizations and plots |
| 16 | +├── research/ # Research materials |
| 17 | +├── requirements.txt # Dependencies |
| 18 | +└── README.md # Project documentation |
| 19 | +``` |
| 20 | + |
| 21 | +## Dataset |
| 22 | +**Source:** Kaggle Medical Appointment No-Shows Dataset |
| 23 | + |
| 24 | +The dataset contains patient appointment information, including: |
| 25 | +- **Patient demographics:** Age, scholarship status, health conditions (hypertension, diabetes, alcoholism, handicap) |
| 26 | +- **Appointment information:** Scheduled day, appointment day, days difference |
| 27 | +- **SMS reminders:** Whether patient received SMS reminder |
| 28 | +- **Target variable:** No-show labels (0 = showed up, 1 = no-show) |
| 29 | + |
| 30 | +**Key Challenges:** |
| 31 | +- Imbalanced dataset with fewer no-shows than shows |
| 32 | +- Time features needing careful processing (handling time components in dates) |
| 33 | +- Feature selection and encoding to improve model learning |
| 34 | + |
| 35 | +## Installation |
| 36 | +```bash |
| 37 | +# Clone the repository |
| 38 | +git clone <your-repo-url> |
| 39 | +cd Medical_Appointment_No_Shows |
| 40 | + |
| 41 | +# Install dependencies |
| 42 | +pip install -r requirements.txt |
| 43 | +``` |
| 44 | + |
| 45 | +## Usage |
| 46 | +1. **Data Exploration:** Open `notebooks/main.ipynb` to explore the dataset |
| 47 | +2. **Data Preprocessing:** The notebook includes data cleaning and feature engineering |
| 48 | +3. **Model Training:** Train models using the provided code in the notebook |
| 49 | +4. **Evaluation:** Review model performance metrics and visualizations |
| 50 | + |
| 51 | +```python |
| 52 | +# Example: Load and explore data |
| 53 | +import pandas as pd |
| 54 | +df = pd.read_csv('data/KaggleV2-May-2016.csv') |
| 55 | +df.head() |
| 56 | +``` |
| 57 | + |
| 58 | +## Model Details |
| 59 | +### Algorithms Used: |
| 60 | +- **Logistic Regression:** Baseline linear model |
| 61 | +- **Decision Tree:** Non-linear decision boundary model |
| 62 | +- **Random Forest:** Ensemble of decision trees with tuning |
| 63 | +- **Support Vector Machine (SVM):** Kernel-based classifier |
| 64 | +- **XGBoost:** Gradient boosting with class balancing |
| 65 | + |
| 66 | +### Performance: |
| 67 | +- **Baseline Accuracy:** ~77% (but poor recall on no-shows) |
| 68 | +- **Tuned Random Forest F1 Score:** ~0.44 |
| 69 | +- **XGBoost Recall:** ~79% (improved no-show detection) |
| 70 | + |
| 71 | +### Training Details: |
| 72 | +1. Data Cleaning and Feature Engineering |
| 73 | +2. Exploratory Data Analysis with visualization (histograms, bar plots, correlation heatmaps) |
| 74 | +3. Baseline model comparisons (Logistic Regression, Decision Tree, Random Forest, SVM) |
| 75 | +4. Hyperparameter tuning using GridSearchCV with custom scoring (F1) |
| 76 | +5. Evaluation with classification metrics focusing on F1 score and recall |
| 77 | +6. Advanced models: XGBoost with balanced class weights |
| 78 | + |
| 79 | +## Results |
| 80 | +- Baseline models achieved up to ~77% accuracy but poor recall on no-shows |
| 81 | +- Hyperparameter tuning improved F1 score significantly (~0.44), emphasizing recall |
| 82 | +- XGBoost model achieved ~79% recall for no-show detection |
| 83 | +- Final tuned models balanced recall and precision for effective no-show detection |
| 84 | +- Feature importance analysis revealed key predictors (SMS received, days difference, age) |
| 85 | + |
| 86 | +**Visualizations:** See `pictures/` folder for feature importance plots, confusion matrices, and other visualizations. |
| 87 | + |
| 88 | +## Contributing |
| 89 | +Contributions, suggestions, and feedback welcome! |
| 90 | + |
| 91 | +**How to contribute:** |
| 92 | +1. Fork the repository |
| 93 | +2. Create a feature branch (`git checkout -b feature/improvement`) |
| 94 | +3. Commit your changes (`git commit -am 'Add new feature'`) |
| 95 | +4. Push to the branch (`git push origin feature/improvement`) |
| 96 | +5. Open a Pull Request |
| 97 | + |
| 98 | +## License |
| 99 | +MIT License |
| 100 | + |
| 101 | +--- |
| 102 | + |
0 commit comments