This project explores the Titanic dataset to predict passenger survival.
It includes data preprocessing, feature engineering, and a comparison between Logistic Regression and Decision Tree models to evaluate performance.
- Encoding: Converted categorical variables (e.g., Sex, Embarked) into numerical values using label encoding and one-hot encoding.
- Outliers: Detected and handled outliers in numerical columns such as Fare and Age.
- Dealing with Nulls: Filled missing Age values using median and filled missing Embarked values with the mode.
- Normalization/Standardization: Scaled numerical features (Age, Fare) to bring them to the same range.
- Feature Engineering: Created new features like FamilySize (SibSp + Parch + 1) and IsAlone, and extracted titles from Name.
- Logistic Regression: Used as a baseline model for binary classification.
- Decision Tree: Trained to capture non-linear patterns and compared with Logistic Regression.
- Metrics: Evaluated using Accuracy, Precision, Recall, and F1-score.
- Comparison: Logistic Regression gave stable results with fewer parameters, while Decision Tree captured more complex relationships but was prone to overfitting.
- Logistic Regression achieved solid performance as a simple baseline.
- Decision Tree improved accuracy on training data but required tuning to generalize well.
- Overall, Logistic Regression was chosen for its balance of performance and interpretability.
The final trained model was exported and deployed on Hugging Face Spaces using Gradio for an interactive demo.
You can try the deployed model directly on Hugging Face:
👉 Titanic Classification – DEPI Mini Project
Thanks to the whole team for contributing to this project 💻✨