Releases: exoplanet-spaceapps/colab_notebook
Releases · exoplanet-spaceapps/colab_notebook
v1.0.0 - Kepler Exoplanet ML Models (92.72% accuracy)
Kepler Exoplanet Detection - Production Models v1.0.0
Complete machine learning models for 3-class exoplanet classification achieving 92.72% accuracy.
Model Performance
| Model | Accuracy | F1-Score | File Size | Inference Time |
|---|---|---|---|---|
| Random Forest | 92.72% | 92.54% | 12.3 MB | ~10ms |
| XGBoost | 92.29% | 92.11% | 2.7 MB | ~5ms |
| Ensemble | 92.29% | 92.11% | 14.1 MB | ~15ms |
| Genesis CNN | 29.10% | 24.90% | 8.6 MB | ~50ms |
Classification Classes
- CANDIDATE - Potential exoplanet candidates
- CONFIRMED - Confirmed exoplanets
- FALSE POSITIVE - False detections
What's Included
This release contains all trained models and preprocessors:
feature_imputer.pkl(6.6 KB) - Missing value imputerfeature_scaler.pkl(19 KB) - StandardScalerxgboost_3class.json(2.7 MB) - XGBoost modelrandom_forest_3class.pkl(12.3 MB) - Random Forest model ⭐ RECOMMENDEDgenesis_cnn_3class.keras(8.6 MB) - Keras CNN modelensemble_voting_3class.pkl(14.1 MB) - Ensemble modelmetadata.json(817 B) - Performance metrics & label mapping
Quick Start
Minimal Setup (Random Forest - RECOMMENDED)
Download only these 3 files for production:
# Download preprocessors and best model
gh release download v1.0.0 --pattern "feature_*.pkl"
gh release download v1.0.0 --pattern "random_forest_3class.pkl"Usage Example
import joblib
import numpy as np
# Load preprocessors
imputer = joblib.load('feature_imputer.pkl')
scaler = joblib.load('feature_scaler.pkl')
# Load model
model = joblib.load('random_forest_3class.pkl')
# Your 783 features
features = np.array([...]) # 783 values
# Preprocess
features_imputed = imputer.transform([features])
features_scaled = scaler.transform(features_imputed)
# Predict
prediction = model.predict(features_scaled)[0]
probabilities = model.predict_proba(features_scaled)[0]
# Label mapping
labels = {0: 'CANDIDATE', 1: 'CONFIRMED', 2: 'FALSE POSITIVE'}
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {probabilities[prediction]:.2%}")Technical Details
Data:
- 1866 samples from Kepler mission
- 783 features (lightcurve statistics)
- Train/Test split: 75%/25% (stratified)
Preprocessing:
- Missing value imputation (median strategy)
- Feature scaling (StandardScaler)
- SMOTE for class balancing
Training:
- Random Forest: 300 trees, max_depth=20, balanced weights
- XGBoost: 200 trees, max_depth=8, learning_rate=0.1
- Genesis CNN: 26 epochs, early stopping
- Ensemble: Averaged predictions from XGBoost + Random Forest
Installation
pip install scikit-learn>=1.3.0 joblib>=1.3.0 numpy>=1.24.0For XGBoost:
pip install xgboost>=2.0.0For CNN:
pip install tensorflow>=2.10.0Documentation
See the repository for complete documentation:
docs/FINAL_SUMMARY.md- Complete implementation summarydocs/USAGE_GUIDE.md- Usage guide with examplesdocs/deployment_guide.md- Production deployment guidescripts/serve_model.py- REST API server example
Validation Results
Tested on 2014 held-out samples:
- Random Forest: 92.72% accuracy, 92.54% F1-score
- Confusion matrices available in
figures/directory - Cross-validated with stratified splits
License
See repository LICENSE file.
Citation
If you use these models in your research, please cite the Kepler mission:
- NASA Kepler Mission: https://www.nasa.gov/kepler
- Kepler Data Archive: https://exoplanetarchive.ipac.caltech.edu/
Released: 2025-10-05
Version: 1.0.0
Best Model: Random Forest (92.72% accuracy)