Skip to content

Commit aab13a2

Browse files
committed
Update readme.md
Added a detailed explanation of the entire project.
1 parent c29a947 commit aab13a2

File tree

1 file changed

+70
-1
lines changed
  • Detection Models/Exoplanet Detection

1 file changed

+70
-1
lines changed
Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,70 @@
1-
Using various machine learning models to determine the presence of Exoplanets based on their flux (luminosity). An exoplanet is a planet that orbits a star outside the solar system. The transit method measures the dimming of a star that happens to have a planet pass in front of it. This transit causes a periodic dimming in the star's flux, which appears as dips in the light curve.
1+
# Exoplanet Detection Using Machine Learning Models
2+
3+
## Introduction
4+
An exoplanet is a planet that orbits a star outside our solar system, and its presence can often be detected by analyzing the light fluctuations or dips observed when the exoplanet passes in front of its host star.
5+
6+
This project aims to classify whether an object is an exoplanet or not based on its flux (luminosity) measurements using several classification models.
7+
8+
## Prerequisites
9+
- Python 3.x
10+
- ``pandas``
11+
- ``numpy``
12+
- ``matplotlib``
13+
- ``seaborn``
14+
- ``scikit-learn``
15+
- ``xgboost``
16+
- ``imbalanced-learn``
17+
- Jupyter Notebook (optional)
18+
19+
### Install Required Libraries
20+
``pip install pandas numpy matplotlib seaborn scikit-learn xgboost imbalanced-learn``
21+
22+
## Data Preprocessing
23+
1. **Data Loading**: The flux data is read into a Pandas DataFrame and any missing values are handled.
24+
2. **Outlier Removal**: Outliers in flux values exceeding a threshold of ``0.25e6`` are removed to avoid skewing model performance.
25+
3. **Feature and Target Variables**:
26+
- **X**: Contains flux values (FLUX.1 to FLUX.3197).
27+
- **y**: Binary labels (``LABEL``), where ``1`` indicates "Not Exoplanet" and ``2`` indicates "Exoplanet."
28+
4. **Balancing the Dataset**:
29+
- The dataset is imbalanced with more observations of non-exoplanet stars than exoplanet stars.
30+
- To address this, SMOTE (Synthetic Minority Over-sampling Technique) is applied to balance the class distribution.
31+
5. **Data Splitting**: The data is split into training and testing sets using an 80-20 split.
32+
6. **Feature Scaling**: ``StandardScaler`` is used to normalize the flux values for better performance of machine learning models.
33+
34+
## Exploratory Data Analysis
35+
1. **Class Distribution**: Visualizes the number of exoplanet and non-exoplanet samples using bar plots.
36+
2. **Flux Comparisons**: Plots the flux values of a representative exoplanet and non-exoplanet sample to observe differences in their luminosity curves.
37+
3. **Boxplots**: Analyzes the distribution of flux values across different classes.
38+
39+
## Model Training
40+
Six different machine learning models are trained and evaluated on the preprocessed dataset and each model is trained using the balanced and scaled dataset, and predictions are made on the test data.
41+
42+
- Logistic Regression
43+
- Random Forest Classifier
44+
- Support Vector Machine (SVM)
45+
- Decision Tree Classifier
46+
- XGBoost
47+
- K-Nearest Neighbors (KNN)
48+
49+
## Model Evaluation
50+
The models are evaluated using the following metrics:
51+
1. **Accuracy Score**: Measures the proportion of correctly predicted instances.
52+
2. **Confusion Matrix**: Provides insight into true positive, true negative, false positive, and false negative predictions.
53+
54+
## Results
55+
| Model | Accuracy | Confusion Matrix |
56+
|------------------------|----------|--------------------------|
57+
| Logistic Regression | 0.831266 | [[979, 39], [301, 696]] |
58+
| Random Forest | 0.556328 | [[1018, 0], [894, 103]] |
59+
| SVM | 0.599504 | [[1003, 15], [792, 205]] |
60+
| Decision Tree | 0.565757 | [[989, 29], [846, 151]] |
61+
| XGBoost | 0.620347 | [[1018, 0], [765, 232]] |
62+
| K-Nearest Neighbors | 0.933002 | [[883, 135], [0, 997]] |
63+
64+
65+
- The **K-Nearest Neighbors (KNN) model** achieved the highest accuracy of 93% making it the best-performing model for this problem.
66+
67+
## Conclusion
68+
- This project effectively demonstrates the application of various machine learning techniques to classify stars as exoplanets or non-exoplanets based on their flux data.
69+
- The K-Nearest Neighbors (KNN) model emerged as the most accurate classifier, achieving an impressive accuracy of 93%.
70+
- This high performance illustrates the potential of machine learning methods in exoplanet detection, enabling more efficient analysis of astronomical data.

0 commit comments

Comments
 (0)