This repository contains a complete data science and machine learning project to predict whether the Falcon 9 first stage will land successfully. The goal is to help estimate potential cost savings from reusable rockets and provide insights into launch success factors.
SpaceX's Falcon 9 rockets are partially reusable, which drastically reduces the cost of space launches. Predicting whether the first stage will successfully land allows competitors and collaborators to make informed financial and logistical decisions. This project includes data collection, cleaning, exploratory data analysis, visualization, and building machine learning models for prediction.
Python Version: 3.8+
Key Libraries:
pandas: For data manipulation and analysisnumpy: For numerical operationsscikit-learn: For machine learning model building (Logistic Regression, SVM, Decision Tree, KNN)matplotlib,seaborn: For data visualization and EDAplotly,dash: For interactive dashboardsfolium: For interactive mapsflask: For serving models via API (if needed)pickle: For saving and loading trained models
Tools:
- Jupyter Notebook: Data exploration, preprocessing, model training
- VS Code / PyCharm: Application development
- Git & GitHub: Version control and collaboration
Algorithms Implemented:
Logistic Regression: For baseline classification performanceSupport Vector Machine (SVM): To separate classes with different kernels (linear, rbf, sigmoid, poly)Decision Tree Classifier: For interpretable tree-based classification with hyperparameter tuningK-Nearest Neighbors (KNN): For instance-based classification using distance metrics
Techniques Used:
- Hyperparameter tuning with
GridSearchCV(10-fold cross-validation) - Feature scaling using
StandardScaler - Evaluation metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix
- Model comparison to select the best-performing classifier
- Pie chart showing launch success counts for all sites and specific sites
- Scatter plot visualizing Payload vs. Launch Outcome, colored by Booster Version
- Range slider for dynamic filtering of payload mass
- Interactive dropdown for selecting specific launch sites
- Markers for launch sites and their proximity to coastlines, railways, and highways
- Color-coded markers showing launch outcomes
- Distance calculations displayed interactively
-
The Decision Tree Classifier achieved the highest accuracy of 90%.
-
The best model confusion matrix showed:
- True Positives (Landed): 12
- True Negatives (Did Not Land): 4
- False Negatives: 0
- False Positives: 2
-
Dashboards and maps provide interactive exploration of SpaceX data.
This project is for educational purposes as part of the IBM Data Science Capstone Course.
- IBM Skills Network & Coursera Capstone Project
- SpaceX API & Wikipedia for open data
- Community contributors on GitHub