AI-powered Indonesia rainfall prediction β built with Streamlit & scikit-learn
Features Β· Live Demo Β· Quick Start Β· Architecture Β· Contributing
nusantara-rainforecaster.streamlit.app No install needed, runs fully in your browser.
Nusantara RainForecaster is a gamified weather web app that analyzes historical Indonesian weather data (BMKG format) and predicts rainfall probability using a Gradient Boosting classifier and Random Forest regressor. It features three modes:
- Smart Mode β pick any of the next 7 days and get an instant AI prediction using historical median values for that month, automatically inferred from your data.
- Manual Mode β enter today's exact conditions (temperature, humidity, sunshine, wind) for a fully custom prediction.
- Dashboard β explore historical trends across stations and date ranges.
Rain timing estimates use real BMKG climatology logic based on sunshine duration (ss), humidity (RH_avg), total rainfall (RR), and seasonal patterns β not random guesses.
- Visual 7-day tile strip showing rain probability and estimated mm for each day
- Click any day tile or pick a custom date β prediction updates instantly
- Features inferred from historical monthly medians with realistic daily variation
- Rain timing derived from
ss+RH_avg+ BMKG seasonal patterns
- Enter all conditions manually:
Tavg,Tn,Tx,RH_avg,ss, wind speeds, and 7-day rolling averages - Same ML model, full user control over inputs
- KPI metrics: total records, rainy days, avg rainfall, avg temperature
- Charts: daily rainfall bar, temperature band (min/avg/max), humidity vs rainfall scatter, monthly heatmap, rain probability by month β all interactive via Plotly
- Classifier: GradientBoostingClassifier β predicts probability of rain (
RR > 0.5 mm) - Regressor: RandomForestRegressor β estimates rainfall volume (mm) on rainy days
- Auto-trains on startup from
data/weather_data.csvwith joblib caching - Model identified by MD5 checksum; retrain anytime from the Data tab
| Condition | Rain Type |
|---|---|
ss = 0 jam |
Sepanjang hari, mulai 06:00 |
ss < 2 jam atau RH β₯ 90% |
Pagi orografik, ~07:00β09:00 |
ss 2β4 jam + musim hujan |
Siang, ~11:00β14:00 |
RH β₯ 90% + musim hujan |
Siang-sore, ~12:00β15:00 |
ss 4β10 jam (default) |
Konvektif sore (paling umum), ~13:00β15:00 |
Durasi dihitung dari RR (mm) Γ· intensitas BMKG per kategori (ringan/sedang/lebat/sangat lebat/ekstrem).
- Uploaded files validated for size (50 MB cap) and schema before parsing
- Station IDs sanitised against a whitelist
- Numeric columns range-clamped to physical bounds; corrupt values β NaN
- Model files written with
chmod 0600 - XSRF protection enabled in
.streamlit/config.toml - No user data persisted between sessions
git clone https://github.com/your-username/nusantara-rainforecaster.git
cd nusantara-rainforecaster
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtPlace your BMKG-format CSV at data/weather_data.csv. The app will auto-train the model on first launch.
Or use the helper to generate synthetic sample data:
python data/generate_sample.pystreamlit run app.pyOr use the helper script:
chmod +x run.sh
./run.sh all # install β generate data β tests β launch appThe model trains automatically on first run and caches to models/cache/. Subsequent launches load the cache instantly.
CSV must include these columns at minimum:
| Column | Description |
|---|---|
date |
DD-MM-YYYY or YYYY-MM-DD |
Tn |
Min temperature (Β°C) |
Tx |
Max temperature (Β°C) |
Tavg |
Avg temperature (Β°C) |
RH_avg |
Avg humidity (%) |
RR |
Rainfall (mm) |
station_id |
Station identifier |
Optional but recommended (improve model accuracy and rain timing estimates):
| Column | Description |
|---|---|
ss |
Sunshine duration (hours/day) β used for rain timing |
ff_x |
Max wind speed (m/s) |
ff_avg |
Avg wind speed (m/s) |
ddd_x |
Wind direction at max speed (Β°) |
After updating data/weather_data.csv, retrain from the Data tab β Retrain Model, or via CLI:
python -c "
from data.loader import load_csv, engineer_features
from models.trainer import train
df = engineer_features(load_csv('data/weather_data.csv'))
m = train(df)
print('Done:', m)
"Then commit the updated cache:
git add models/cache/rain_classifier.joblib models/cache/rain_regressor.joblib
git commit -m "retrain model"
git pushNote: Pin your scikit-learn version in
requirements.txt(e.g.scikit-learn==1.6.0) to ensure the cached.joblibis compatible across environments.
pytest # all tests
pytest tests/test_loader.py # single file
pytest --cov=. --cov-report=term-missing # with coveragenusantara-rainforecaster/
βββ app.py # Entry point, page routing, Smart/Manual mode
βββ run.sh # Dev helper (setup / test / launch)
βββ requirements.txt
βββ pytest.ini
β
βββ .streamlit/
β βββ config.toml # Theme, XSRF, upload limits
β
βββ data/
β βββ loader.py # CSV parsing, validation, feature engineering
β βββ weather_data.csv # Your BMKG data goes here
β βββ generate_sample.py # Synthetic dataset generator
β
βββ models/
β βββ trainer.py # Pipeline builders, train(), predict(),
β β # estimate_rain_hours() (BMKG logic)
β βββ cache/
β βββ rain_classifier.joblib
β βββ rain_regressor.joblib
β
βββ utils/
β βββ charts.py # Plotly figure builders
β βββ style.py # CSS injection, UI components
β
βββ tests/
β βββ test_loader.py
β βββ test_trainer.py
β βββ test_charts.py
β
βββ assets/
βββ rainfore-logo.jpg
| Layer | Library |
|---|---|
| Web UI | Streamlit 1.32+ |
| ML β Classification | GradientBoostingClassifier (scikit-learn 1.6.0) |
| ML β Regression | RandomForestRegressor (scikit-learn 1.6.0) |
| Data Processing | Pandas, NumPy |
| Visualisation | Plotly |
| Model Persistence | joblib |
| Testing | pytest, pytest-cov |
- BMKG API integration for live feature pulling
- Multi-step ahead forecasting (3-day outlook)
- API endpoint (FastAPI) for headless predictions
- Model versioning and comparison UI
- Dockerised deployment
- Fork the repository.
- Create a branch:
git checkout -b feature/your-feature. - Make changes and add tests.
- Run
pytestβ all tests must pass. - Open a pull request with a clear description.
MIT β see LICENSE.