Skip to content

This project demonstrates a complete machine learning workflow on the Titanic dataset from Kaggle. The goal is to predict passenger survival using advanced techniques such as handling missing values, encoding categorical variables, building pipelines, cross-validation, and using XGBoost classifier.

License

Notifications You must be signed in to change notification settings

GenesisBlock3301/titanic-ml-pipeline-xgboost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Titanic ML Pipeline with XGBoost

Project Overview

This project demonstrates a complete machine learning workflow on the Titanic dataset from Kaggle. The goal is to predict passenger survival using advanced techniques such as handling missing values, encoding categorical variables, building pipelines, cross-validation, and using XGBoost classifier.

The project emphasizes:

  • Data cleaning and missing value imputation
  • Feature engineering and categorical encoding
  • Creating reusable and efficient ML pipelines
  • Model evaluation with cross-validation
  • Avoiding data leakage to ensure model integrity

Dataset

The dataset is sourced from the Titanic - Machine Learning from Disaster competition on Kaggle.


Key Features

  • Handling of missing values (e.g., Age, Embarked)
  • Encoding of categorical variables (e.g., Sex, Embarked)
  • Use of Scikit-learn Pipelines for clean workflow
  • Cross-validation for robust evaluation
  • XGBoost classifier for improved predictive power
  • Clear separation of training and validation data to prevent data leakage

How to Run

  1. Clone this repository:
git clone https://github.com/yourusername/titanic-ml-pipeline-xgboost.git
cd titanic-ml-pipeline-xgboost

About

This project demonstrates a complete machine learning workflow on the Titanic dataset from Kaggle. The goal is to predict passenger survival using advanced techniques such as handling missing values, encoding categorical variables, building pipelines, cross-validation, and using XGBoost classifier.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published