Asteroid Close Approach Data Analysis and Prediction

Overview

This project focuses on analyzing a dataset of asteroid close approaches to Earth. The primary goal is to preprocess the data, handle missing values, engineer relevant features, and build a machine learning model to predict whether an asteroid is hazardous to Earth.

Dataset

The dataset contains information about asteroid close approaches, such as orbital parameters, velocity, approach date, and distances from Earth. Some fields were incomplete, which required imputation and data preprocessing.

Total Rows: 4534
Total Columns: 24 (including numerical and categorical features)

Key columns include:

Numerical Columns: Asteroid velocity, miss distances, orbital characteristics.
Categorical Columns: Velocity descriptors, orbital period descriptors, hazardous labels.

Data Preprocessing

Handling Missing Data

Missing values were imputed using K-Nearest Neighbors (KNN) imputation for numerical fields and mode imputation for categorical fields. KNN finds the nearest records with complete data and imputes missing values based on the average of the nearest neighbors. This method was suitable given the nature of the dataset.

Feature Engineering

Binning of Continuous Variables: Continuous variables like relative velocity were categorized into bins (e.g., Very Slow, Slow, Fast, Very Fast) to improve interpretability.
Correlation Analysis: A correlation matrix was generated to identify relationships between the numerical variables.

Model Building

After data preprocessing, several models were trained on the dataset to predict whether an asteroid is hazardous or not. The final model chosen was an ensemble method, which achieved the best performance.

Key Steps:

Data was split into training and testing sets.
Categorical variables were one-hot encoded for machine learning models.
The final model achieved an accuracy of 85% on the test set.

Key Files

final_imputed_dataset.csv: The dataset after KNN imputation.
Preprocessed_dataset.csv: The preprocessed dataset, ready for model training.
Astronomy.ipynb: Jupyter notebook containing the entire workflow, including data cleaning, imputation, feature engineering, and model training.
report.pdf: A detailed report summarizing the project methodology, results, and conclusions.

Results

The model achieved an accuracy of 85% in predicting hazardous asteroids.
Velocity and miss distances played a critical role in identifying potentially hazardous objects.

Data Visualization

Correlation Heatmap: Visualizes relationships between key features.
Histograms: Show the distribution of velocities, orbital periods, and miss distances.

Conclusion

This project successfully demonstrates how to handle missing data using KNN imputation, perform feature engineering, and apply machine learning techniques to classify asteroids. The final model shows promising performance in identifying hazardous asteroids, which can help in early detection and monitoring.

Future Work

Explore alternative imputation methods, such as MICE or deep learning-based approaches.
Implement additional feature engineering techniques to enhance prediction accuracy.
Experiment with more advanced models, such as gradient boosting or neural networks, to further improve classification performance.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
plots		plots
Astronomy.ipynb		Astronomy.ipynb
Preprocessed_dataset.csv		Preprocessed_dataset.csv
README.MD		README.MD
Report_rottenidli.pdf		Report_rottenidli.pdf
a.py		a.py
dataset.csv		dataset.csv
final_imputed_dataset.csv		final_imputed_dataset.csv
output.txt		output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Asteroid Close Approach Data Analysis and Prediction

Overview

Dataset

Data Preprocessing

Handling Missing Data

Feature Engineering

Model Building

Key Steps:

Key Files

Results

Data Visualization

Conclusion

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sesiii/Asteroid-Impact-Prediction

Folders and files

Latest commit

History

Repository files navigation

Asteroid Close Approach Data Analysis and Prediction

Overview

Dataset

Data Preprocessing

Handling Missing Data

Feature Engineering

Model Building

Key Steps:

Key Files

Results

Data Visualization

Conclusion

Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages