Skip to content

Releases: exoplanet-spaceapps/colab_notebook

v1.0.0 - Kepler Exoplanet ML Models (92.72% accuracy)

05 Oct 13:57

Choose a tag to compare

Kepler Exoplanet Detection - Production Models v1.0.0

Complete machine learning models for 3-class exoplanet classification achieving 92.72% accuracy.

Model Performance

Model Accuracy F1-Score File Size Inference Time
Random Forest 92.72% 92.54% 12.3 MB ~10ms
XGBoost 92.29% 92.11% 2.7 MB ~5ms
Ensemble 92.29% 92.11% 14.1 MB ~15ms
Genesis CNN 29.10% 24.90% 8.6 MB ~50ms

Classification Classes

  1. CANDIDATE - Potential exoplanet candidates
  2. CONFIRMED - Confirmed exoplanets
  3. FALSE POSITIVE - False detections

What's Included

This release contains all trained models and preprocessors:

  • feature_imputer.pkl (6.6 KB) - Missing value imputer
  • feature_scaler.pkl (19 KB) - StandardScaler
  • xgboost_3class.json (2.7 MB) - XGBoost model
  • random_forest_3class.pkl (12.3 MB) - Random Forest model ⭐ RECOMMENDED
  • genesis_cnn_3class.keras (8.6 MB) - Keras CNN model
  • ensemble_voting_3class.pkl (14.1 MB) - Ensemble model
  • metadata.json (817 B) - Performance metrics & label mapping

Quick Start

Minimal Setup (Random Forest - RECOMMENDED)

Download only these 3 files for production:

# Download preprocessors and best model
gh release download v1.0.0 --pattern "feature_*.pkl"
gh release download v1.0.0 --pattern "random_forest_3class.pkl"

Usage Example

import joblib
import numpy as np

# Load preprocessors
imputer = joblib.load('feature_imputer.pkl')
scaler = joblib.load('feature_scaler.pkl')

# Load model
model = joblib.load('random_forest_3class.pkl')

# Your 783 features
features = np.array([...])  # 783 values

# Preprocess
features_imputed = imputer.transform([features])
features_scaled = scaler.transform(features_imputed)

# Predict
prediction = model.predict(features_scaled)[0]
probabilities = model.predict_proba(features_scaled)[0]

# Label mapping
labels = {0: 'CANDIDATE', 1: 'CONFIRMED', 2: 'FALSE POSITIVE'}
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {probabilities[prediction]:.2%}")

Technical Details

Data:

  • 1866 samples from Kepler mission
  • 783 features (lightcurve statistics)
  • Train/Test split: 75%/25% (stratified)

Preprocessing:

  • Missing value imputation (median strategy)
  • Feature scaling (StandardScaler)
  • SMOTE for class balancing

Training:

  • Random Forest: 300 trees, max_depth=20, balanced weights
  • XGBoost: 200 trees, max_depth=8, learning_rate=0.1
  • Genesis CNN: 26 epochs, early stopping
  • Ensemble: Averaged predictions from XGBoost + Random Forest

Installation

pip install scikit-learn>=1.3.0 joblib>=1.3.0 numpy>=1.24.0

For XGBoost:

pip install xgboost>=2.0.0

For CNN:

pip install tensorflow>=2.10.0

Documentation

See the repository for complete documentation:

  • docs/FINAL_SUMMARY.md - Complete implementation summary
  • docs/USAGE_GUIDE.md - Usage guide with examples
  • docs/deployment_guide.md - Production deployment guide
  • scripts/serve_model.py - REST API server example

Validation Results

Tested on 2014 held-out samples:

  • Random Forest: 92.72% accuracy, 92.54% F1-score
  • Confusion matrices available in figures/ directory
  • Cross-validated with stratified splits

License

See repository LICENSE file.

Citation

If you use these models in your research, please cite the Kepler mission:


Released: 2025-10-05
Version: 1.0.0
Best Model: Random Forest (92.72% accuracy)