🌌 MALLORN

Many Artificial LSST Lightcurves based on Observations of Real Nuclear transients

MALLORN is a specialized astronomical classification pipeline engineered to detect Tidal Disruption Events (TDEs)—rare instances where a star is torn apart by a supermassive black hole—from photometric light curves. Built for upcoming synoptic surveys like the Vera C. Rubin Observatory (LSST), it physically models extreme transient phenomena and separates them from vast backgrounds of supernovae and variable stars.

🏗️ System Architecture

Our pipeline is heavily modularized for professional execution, scaling, and research iterations.

graph TD
    subgraph Data Pipeline
        A[Raw Lightcurves] --> B[Data Splitting & Loading]
        B --> C[Feature Extraction]
    end
    
    subgraph Feature Engineering
        C --> D1(Time-Domain Features)
        C --> D2(Color & Cross-Band)
        C --> D3(Physics & Wavelets)
    end
    
    subgraph Augmentation & Balancing
        D1 & D2 & D3 --> F[SMOTE / ENN / ADASYN]
        F --> G[Feature Selection & Scaling]
    end
    
    subgraph Core Models
        G --> H1[LightGBM]
        G --> H2[XGBoost]
        G --> H3[Deep Neural Nets]
        G --> H4[Random Forest]
    end
    
    subgraph Advanced Semi-Supervised
        H1 & H2 & H3 & H4 --> I[Noisy Student Pseudo-Labeling]
        H1 & H2 & H3 & H4 --> J[Multi-View Co-Training]
    end
    
    subgraph Optimization
        I & J --> K[Stacking Meta-Learners]
        K --> L[Recall-First Ensembling]
        L --> M[Global Threshold Calibration]
    end
    
    M --> N(((Final Predictions.csv)))

    classDef primary fill:#150458,stroke:#fff,stroke-width:2px,color:#fff;
    classDef secondary fill:#0d1117,stroke:#58a6ff,stroke-width:2px,color:#c9d1d9;
    classDef output fill:#238636,stroke:#fff,stroke-width:2px,color:#fff;
    
    class A,C,G primary;
    class H1,H2,H3,H4,I,J,K,L,M secondary;
    class N output;

🚀 Key Features

Extensive Feature Extraction: Automatically calculates Physics-based features (Bazin functions, power-law fitting), Time-domain signals (FFT, wavelets, Gaussian Processes), and multi-band Color interactions.
Advanced Class Balancing: Integrates SMOTE, SVMSmote, ADASYN, and Tomek Links to seamlessly handle severe target imbalances (e.g., 150:1 ratios).
Semi-Supervised Learning: Implements Noisy Student Pseudo-Labeling and Multi-View Co-Training to harness unlabeled data iteratively and significantly boost F1 margins.
Robust Ensembling: Features stacked meta-learners with isotonic probability calibration, Rank-based ensembling, and a specialized Recall-First Gating mechanism to prioritize rare events.
Deep Learning Support: Ships with robust architectures built in PyTorch (ResBlocks, SEBlocks, Focal Loss) and hooks for Vision Transformer interfaces (SwinV2).

📁 Directory Structure

The repository is logically modularized into discrete, independent domains:

MALLORN/
├── mallorn/
│   ├── config.py           # Single source of truth for global configuration & hyperparameters
│   ├── utils.py            # Diagnostic tools, math utilities, and I/O formatters
│   ├── data/               # Augmentation (SMOTE), PyTorch datasets, and advanced pseudo-labeling
│   ├── features/           # Lightcurve feature extraction, physical models, aggressive dimensionality reduction
│   ├── models/             # PyTorch Neural Networks, Tree hyperparams, calibration routines
│   └── training/           # Iterative Stratified CV loops, model ensembling, F1 threshold optimization
├── run_pipeline.py         # ⚡ The main execution script connecting all modules seamlessly
└── requirements.txt        # Comprehensive Python dependencies

🛠️ How to Run

Install Dependencies Ensure you have Python 3.8+ installed, then install the required packages:
```
pip install -r requirements.txt
```
Prepare Data The raw astronomical lightcurves and metadata used for this pipeline are hosted externally due to size constraints. Download the dataset directly from Kaggle:
- MALLORN Astronomical Classification Challenge
Extract the downloaded archive directly into the root repository folder. Your structure should look like this (or you can manually redirect BASE_DIR inside mallorn/config.py):
- train_log.csv
- test_log.csv
- split_01/, split_02/, split_03/, etc.
Execute the Pipeline Trigger the fully orchestrated training, ensembling, and inference orchestrator:
```
python run_pipeline.py
```

⚙️ Configuration Tuning

Everything is customizable inside mallorn/config.py. Key pipeline parameters include:

Model Selection: USE_LGBM, USE_XGB, USE_NN, USE_RF
Imbalance Strategies: Enable via USE_SMOTE_ENN, USE_ADASYN, USE_BORDERLINE_SMOTE
Semi-Supervised Flow: Scale capabilities via USE_ADVANCED_PSEUDO_LABELS, USE_MULTI_VIEW_COTRAINING
Losses: Optimize precision/recall balance universally mapped via FOCAL_LOSS_GAMMA and FOCAL_LOSS_ALPHA.

📈 Output & Deliverables

Upon completion, the pipeline outputs:

predictions.csv: The fully optimized list of target inferences for your test dataset.
Telemetry & Diagnostics: Detailed Out-of-Fold (OOF) cross-validation metrics inside your console (F1, Precision, Recall, and P-R Gaps).
Thresholds: Formally evaluated median thresholds calculated across repeated stratified folds ensuring real-world deployment readiness.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
mallorn		mallorn
LICENSE		LICENSE
README.md		README.md
_map.py		_map.py
cnn_model.py		cnn_model.py
config_optimization.yaml		config_optimization.yaml
predictions.csv		predictions.csv
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌌 MALLORN

🏗️ System Architecture

🚀 Key Features

📁 Directory Structure

🛠️ How to Run

⚙️ Configuration Tuning

📈 Output & Deliverables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌌 MALLORN

🏗️ System Architecture

🚀 Key Features

📁 Directory Structure

🛠️ How to Run

⚙️ Configuration Tuning

📈 Output & Deliverables

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages