Kepler Exoplanet Detection - Production Models v1.0.0

Complete machine learning models for 3-class exoplanet classification achieving 92.72% accuracy.

Model Performance

Model	Accuracy	F1-Score	File Size	Inference Time
Random Forest	92.72%	92.54%	12.3 MB	~10ms
XGBoost	92.29%	92.11%	2.7 MB	~5ms
Ensemble	92.29%	92.11%	14.1 MB	~15ms
Genesis CNN	29.10%	24.90%	8.6 MB	~50ms

Classification Classes

CANDIDATE - Potential exoplanet candidates
CONFIRMED - Confirmed exoplanets
FALSE POSITIVE - False detections

What's Included

This release contains all trained models and preprocessors:

feature_imputer.pkl (6.6 KB) - Missing value imputer
feature_scaler.pkl (19 KB) - StandardScaler
xgboost_3class.json (2.7 MB) - XGBoost model
random_forest_3class.pkl (12.3 MB) - Random Forest model ⭐ RECOMMENDED
genesis_cnn_3class.keras (8.6 MB) - Keras CNN model
ensemble_voting_3class.pkl (14.1 MB) - Ensemble model
metadata.json (817 B) - Performance metrics & label mapping

Quick Start

Minimal Setup (Random Forest - RECOMMENDED)

Download only these 3 files for production:

# Download preprocessors and best model
gh release download v1.0.0 --pattern "feature_*.pkl"
gh release download v1.0.0 --pattern "random_forest_3class.pkl"

Usage Example

import joblib
import numpy as np

# Load preprocessors
imputer = joblib.load('feature_imputer.pkl')
scaler = joblib.load('feature_scaler.pkl')

# Load model
model = joblib.load('random_forest_3class.pkl')

# Your 783 features
features = np.array([...])  # 783 values

# Preprocess
features_imputed = imputer.transform([features])
features_scaled = scaler.transform(features_imputed)

# Predict
prediction = model.predict(features_scaled)[0]
probabilities = model.predict_proba(features_scaled)[0]

# Label mapping
labels = {0: 'CANDIDATE', 1: 'CONFIRMED', 2: 'FALSE POSITIVE'}
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {probabilities[prediction]:.2%}")

Technical Details

Data:

1866 samples from Kepler mission
783 features (lightcurve statistics)
Train/Test split: 75%/25% (stratified)

Preprocessing:

Missing value imputation (median strategy)
Feature scaling (StandardScaler)
SMOTE for class balancing

Training:

Random Forest: 300 trees, max_depth=20, balanced weights
XGBoost: 200 trees, max_depth=8, learning_rate=0.1
Genesis CNN: 26 epochs, early stopping
Ensemble: Averaged predictions from XGBoost + Random Forest

Installation

pip install scikit-learn>=1.3.0 joblib>=1.3.0 numpy>=1.24.0

For XGBoost:

pip install xgboost>=2.0.0

For CNN:

pip install tensorflow>=2.10.0

Documentation

See the repository for complete documentation:

docs/FINAL_SUMMARY.md - Complete implementation summary
docs/USAGE_GUIDE.md - Usage guide with examples
docs/deployment_guide.md - Production deployment guide
scripts/serve_model.py - REST API server example

Validation Results

Tested on 2014 held-out samples:

Random Forest: 92.72% accuracy, 92.54% F1-score
Confusion matrices available in figures/ directory
Cross-validated with stratified splits

License

See repository LICENSE file.

Citation

If you use these models in your research, please cite the Kepler mission:

NASA Kepler Mission: https://www.nasa.gov/kepler
Kepler Data Archive: https://exoplanetarchive.ipac.caltech.edu/

Released: 2025-10-05
Version: 1.0.0
Best Model: Random Forest (92.72% accuracy)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Kepler Exoplanet Detection - Production Models v1.0.0

Model Performance

Classification Classes

What's Included

Quick Start

Minimal Setup (Random Forest - RECOMMENDED)

Usage Example

Technical Details

Installation

Documentation

Validation Results

License

Citation

Uh oh!

Releases: exoplanet-spaceapps/colab_notebook

v1.0.0 - Kepler Exoplanet ML Models (92.72% accuracy)

Kepler Exoplanet Detection - Production Models v1.0.0

Model Performance

Classification Classes

What's Included

Quick Start

Minimal Setup (Random Forest - RECOMMENDED)

Usage Example

Technical Details

Installation

Documentation

Validation Results

License

Citation

Uh oh!