This project aims to predict the severity of airplane accidents using machine learning. The model analyzes various flight, safety, and environmental features to classify accidents into different severity categories, helping to assess risk and prevent future incidents.
-
Data Loading
- The training dataset (
trn.csv) and test dataset (tst.csv) are loaded using Pandas.
- The training dataset (
-
Data Preprocessing
- Checks for missing values.
- Encodes the target variable
Severityusing Label Encoding. - Drops non-essential columns such as
Accident_IDandAdverse_Weather_Metric.
-
Exploratory Data Analysis (EDA)
- Generates a Pearson correlation matrix to identify relationships between numerical features.
- Visualizes correlations with a Seaborn heatmap to understand which variables most affect severity.
-
Feature Engineering
- Selects highly correlated features for better model accuracy.
- Scales continuous variables like
Max_Elevation,Safety_Score, andAccident_Type_Codeusing MinMaxScaler.
-
Model Training
- Splits the data (90% training, 10% testing).
- Trains an XGBoost Classifier, a gradient-boosting algorithm optimized for structured data.
-
Prediction & Evaluation
-
Predicts severity levels for the test dataset.
-
Maps numeric predictions to descriptive categories:
0β Highly Fatal and Damaging1β Minor Damage and Injuries2β Significant Damage and Fatalities3β Significant Damage and Serious Injuries
-
-
Output Generation
- Saves predictions to a CSV file (
Q130.csv) containingAccident_IDand corresponding severity predictions.
- Saves predictions to a CSV file (
- Languages: Python
- Libraries: Pandas, NumPy, Scikit-learn, XGBoost, Seaborn, Matplotlib, Statsmodels
- Visualization: Correlation Heatmaps
The model successfully classifies airplane accidents into four severity categories based on the given features, demonstrating effective feature analysis and predictive modeling using XGBoost.
- Hyperparameter tuning for improved accuracy.
- Incorporating additional weather and aircraft-specific parameters.
- Deployment as a web-based prediction dashboard.