This repository contains a machine learning project aimed at predicting the survival of passengers on the Titanic. Using the famous Titanic dataset, we applied a RandomForestClassifier to build a predictive model. The following steps were undertaken:
- Data Preprocessing: Handled missing values, encoded categorical variables, and normalized data.
- Model Training: Trained a RandomForestClassifier model using the training data.
- Model Validation: Achieved an accuracy of 82.12% on the validation set.
- Prediction: Made predictions on the test set, with a detailed analysis of the results.
- Confusion Matrix: Showcases the accuracy of the model in predicting survival outcomes.
- ROC Curve: Evaluates the performance of the classifier with an AUC score of 0.86.
- Prediction Distribution: Illustrates the distribution of predicted survival outcomes in the test data.
The model performed well, providing valuable insights into which features were most influential in predicting survival. The project demonstrates the practical application of machine learning techniques to a classic problem in data science.
- train.csv: Training data used to build the model.
- test.csv: Test data for making predictions.
- titanic_model.ipynb: Jupyter notebook with the full code for data preprocessing, model training, and prediction.
- Clone the repository.
- Install the required libraries.
- Run the Jupyter notebook to replicate the analysis.