Titanic ML Pipeline with XGBoost

Project Overview

This project demonstrates a complete machine learning workflow on the Titanic dataset from Kaggle. The goal is to predict passenger survival using advanced techniques such as handling missing values, encoding categorical variables, building pipelines, cross-validation, and using XGBoost classifier.

The project emphasizes:

Data cleaning and missing value imputation
Feature engineering and categorical encoding
Creating reusable and efficient ML pipelines
Model evaluation with cross-validation
Avoiding data leakage to ensure model integrity

Dataset

The dataset is sourced from the Titanic - Machine Learning from Disaster competition on Kaggle.

Key Features

Handling of missing values (e.g., Age, Embarked)
Encoding of categorical variables (e.g., Sex, Embarked)
Use of Scikit-learn Pipelines for clean workflow
Cross-validation for robust evaluation
XGBoost classifier for improved predictive power
Clear separation of training and validation data to prevent data leakage

How to Run

Clone this repository:

git clone https://github.com/yourusername/titanic-ml-pipeline-xgboost.git
cd titanic-ml-pipeline-xgboost

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
blog.md		blog.md
requirements.txt		requirements.txt
titanic_ml_pipeline.py		titanic_ml_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Titanic ML Pipeline with XGBoost

Project Overview

Dataset

Key Features

How to Run

About

Uh oh!

Releases

Packages

Languages

License

GenesisBlock3301/titanic-ml-pipeline-xgboost

Folders and files

Latest commit

History

Repository files navigation

Titanic ML Pipeline with XGBoost

Project Overview

Dataset

Key Features

How to Run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages