Metric Score
----------------------------- ----------------------------
Private Leaderboard AUC 0.95524
Public Leaderboard AUC 0.95377
Final Rank 604 / 4371 (Top \~14%)
This repository contains my structured experimentation pipeline developed for the Kaggle competition:
Playground Series - Season 6, Episode 2: Predicting Heart Disease
The focus was on disciplined cross-validation, model evolution, and competitive generalization performance.
Predict the probability of heart disease using structured clinical features by:
- Exploring multiple model families
- Applying deep learning techniques to tabular data
- Performing hyperparameter optimization
- Building a robust Out-of-Fold (OOF) validation pipeline
- Maintaining leaderboard stability
The goal was not just leaderboard score --- but strong validation methodology and reproducibility.
- XGBoost
- LightGBM
- CatBoost
- Dense Neural Networks (TensorFlow / Keras)
- Regularization tuning (Dropout, BatchNorm)
- Depth/width exploration
- Learning rate scheduling
- Missing value handling
- Scaling & normalization
- Encoding strategies
- Distribution adjustments
- Interaction features
- Binning
- Frequency Encoding
- GroupMean Encoding
- Statistical combinations
- Iterative refinement
- Feature selection experiments
- Hyperparameter tuning with Optuna
- Early stopping strategies
- Cross-validation stability monitoring
- Overfitting mitigation
To avoid leaderboard overfitting:
- K-Fold Cross Validation
- Out-of-Fold (OOF) prediction tracking
- Consistent AUC comparison across folds
- Submission pipeline from OOF-trained models
This ensured strong alignment between public and private leaderboard scores.
Heart-Disease-Prediction/
│
├── submision_outputs/
├── medal-winning-notebooks/
├── heart-disesase-prediction/
├── best_documented_iteration/
├── best_performing_iteration_and_submission/
└── README.md
- Python
- TensorFlow / Keras
- XGBoost
- LightGBM
- CatBoost
- Optuna
- Scikit-learn
- Pandas / NumPy
- Jupyter Notebook
- ✔ Competitive machine learning workflow
- ✔ Multi-model experimentation
- ✔ Deep learning applied to structured tabular data
- ✔ Strong cross-validation discipline
- ✔ Hyperparameter optimization pipelines
- ✔ Clean experiment iteration structure
Yao Yan, Walter Reade, Elizabeth Park. Predicting Heart Disease. https://kaggle.com/competitions/playground-series-s6e2, 2026. Kaggle.
- Model ensembling (stacking / blending)
- Feature importance stability analysis
- Experiment tracking
- Deployment-ready inference pipeline
Adwait Tagalpalewar