Skip to content

Model improvement ideas #7

@Joelius300

Description

@Joelius300

See also idea-dump

LLMs yapping

Claude (Sonnet 4.5)

For hourly river temperature forecasting over ~100 hours, I'd recommend a hybrid approach that combines domain knowledge with modern deep learning architectures. Here's my detailed strategy:

Feature Engineering

Core Features:

  1. Historical river temperature (lookback: 168-336 hours / 7-14 days)

    • River temperature has strong autocorrelation and daily/weekly patterns
    • Include lag features: t-1, t-24, t-48, t-72, t-168
    • Rolling statistics: 24h, 72h, 168h means and standard deviations
    • Rate of change features (delta temperature over 1h, 6h, 24h)
  2. Air temperature (lookback: 48-168 hours, forecast: full 100h horizon)

    • Strongest driver of river temperature (typically 0.7-0.9 correlation)
    • Historical: same lag and rolling features as river temp
    • Critical: Use forecasted air temperature from NWP models (GFS, ECMWF, or NOAA) for the prediction horizon
    • Consider air-water temperature difference (thermal deficit/surplus)
  3. Solar radiation / Daylight hours (lookback: 48-72 hours)

    • Direct heating effect on water
    • Can derive from astronomical calculations (deterministic) or use forecasts
    • Include time-of-day and day-of-year encodings (cyclical: sin/cos transforms)
  4. River discharge/flow (lookback: 168-336 hours if available)

    • Higher flow → more thermal inertia, slower temperature response
    • Flow regime changes affect mixing and heat transfer
    • If forecasts available, include them; otherwise use persistence or climatology
  5. Meteorological variables (lookback: 48-72 hours, forecasts when available)

    • Wind speed: affects evaporative cooling
    • Humidity/dewpoint: influences latent heat flux
    • Cloud cover: modulates solar radiation
    • Precipitation: can cause sudden temperature drops

Temporal Features:

  • Hour of day (cyclical encoding: sin/cos of 2π*hour/24)
  • Day of year (cyclical encoding for seasonality)
  • Day of week (weekend/weekday patterns for regulated rivers)
  • Month indicators

Spatial Features (if multi-site):

  • Upstream station temperatures (lag by flow travel time)
  • Elevation, latitude (static features)

Feature Transformations

  1. Normalization: Z-score standardization per feature (critical for neural networks)
  2. Cyclical encoding: sin/cos transforms for hour, day-of-year
  3. Differencing: First-order differences for non-stationary series
  4. Interaction terms: Air temp × solar radiation, discharge × air temp
  5. Log transforms: For skewed features like discharge
  6. Rolling aggregations: Multiple window sizes to capture different temporal scales

Model Recommendations

Given your hourly, 100-hour horizon requirement, here are my top choices:

Tier 1: Primary Recommendations

1. TiDE (Time-series Dense Encoder) - My top choice

  • Why: Excellent for medium-horizon forecasting, computationally efficient, handles covariates well
  • Architecture: Dense encoder-decoder with residual connections
  • Strengths:
    • Superior to Transformer-based models on many benchmarks
    • Much faster training than attention mechanisms
    • Naturally handles both past and future covariates (like forecasted air temp)
    • Good interpolation between lookback and horizon
  • Config: Lookback 168h, hidden dims [256, 256], decoder layers 2-3

2. Temporal Fusion Transformer (TFT)

  • Why: Designed specifically for multi-horizon forecasting with mixed data types
  • Strengths:
    • Built-in variable selection network (identifies important features)
    • Handles static, known future, and observed inputs elegantly
    • Quantile regression for uncertainty quantification
    • Interpretable attention weights
  • Limitation: Computationally expensive, may be overkill for 100h horizon
  • Config: 4-8 attention heads, lookback 168-336h

3. TSMixer

  • Why: Recent architecture showing strong performance, computationally efficient
  • Strengths:
    • MLP-based with separate time and feature mixing
    • Very fast training and inference
    • Competitive with Transformers at lower computational cost
  • Config: 4-8 mixing blocks, lookback 168h

Tier 2: Strong Alternatives

4. N-BEATS with Exogenous Variables (N-BEATSx)

  • Pure deep learning, interpretable basis functions
  • Excellent for patterns with clear trend and seasonality
  • May need modification to fully leverage meteorological forecasts

5. DeepAR (Amazon)

  • Probabilistic forecasting with autoregressive RNN
  • Good uncertainty estimates
  • Handles missing data well
  • Older architecture but still competitive

6. LightGBM/XGBoost with engineered lags

  • Pragmatic baseline that often outperforms complex models
  • Fast training, excellent feature importance
  • Use with extensive feature engineering (lags, rolling stats, interactions)
  • Multi-step approach: direct or recursive forecasting

Tier 3: Classical/Hybrid

7. SARIMAX + Neural Network residual modeling

  • SARIMAX captures linear dynamics and seasonality
  • NN model (simple MLP) learns residuals
  • Interpretable and often surprisingly effective

8. Prophet + XGBoost

  • Prophet for trend/seasonality, XGBoost for complex non-linearities
  • Good when you have strong domain knowledge about change points

Recommended Approach

Phase 1: Baseline

  • Start with XGBoost/LightGBM with rich features (1-2 days development)
  • Establishes performance floor and identifies important features

Phase 2: Deep Learning

  • Implement TiDE as primary model (best balance of performance/complexity)
  • Implement TFT if you need interpretability and uncertainty quantification
  • Compare with TSMixer for potential efficiency gains

Phase 3: Ensemble

  • Weighted average of top 2-3 models
  • Often yields 5-15% improvement over single models

Training Strategy

  1. Loss function: MAE or Huber loss (robust to outliers)
  2. Validation: Time-series cross-validation with expanding window
  3. Horizon-specific losses: Weight near-term predictions more heavily if needed
  4. Quantile loss: For probabilistic forecasts (critical for decision-making)

External Data Sources

  1. Weather forecasts: NOAA GFS, ECMWF (via APIs like Open-Meteo)
  2. Historical climate: ERA5 reanalysis for gap-filling
  3. River gauge data: USGS real-time water data
  4. Satellite data: Land surface temperature if available (MODIS)

Key Success Factors

  • Future covariates are crucial: Access to 100h air temperature forecasts is potentially the single most important factor
  • Domain knowledge: Physics-informed features (heat transfer principles) boost performance
  • Operational constraints: TiDE/TSMixer offer best latency for real-time deployment
  • Uncertainty matters: Use quantile regression or ensemble methods for decision support

For a production system, I'd likely deploy TiDE with XGBoost as a fallback, using NWP forecasts for meteorological variables, and retrain weekly on the most recent 2-3 years of data.

ChatGPT (GPT-5.2)

Below is how I would design a short-term, hourly river temperature forecasting system (~100 hours ahead) if the goal is high accuracy, operational robustness, and interpretability—and with modern SOTA time-series models in mind (TSMixer, TFT, TiDE, etc.).

I’ll structure this as:

  1. Physical intuition first (what actually drives river temperature)
  2. Feature groups & lookback horizons
  3. Transformations & representations
  4. External data & forecast inputs
  5. Modeling approaches (classical → SOTA deep learning)
  6. Training & validation strategy
  7. What I’d deploy in practice

1. Physical drivers of river temperature (guides feature choice)

River temperature is governed by:

  • Atmospheric heat exchange (air temp, radiation, wind, humidity)
  • Hydrology (discharge, velocity, depth)
  • Thermal inertia (water volume, prior temps)
  • Solar forcing (diurnal + seasonal cycles)
  • Upstream conditions (lagged influence)
  • Anthropogenic controls (dams, releases, power plants)

Short-term (4-day) forecasts are dominated by:

  • Recent river temperature
  • Air temperature & solar radiation
  • Discharge / flow regime
  • Diurnal cycles

This strongly favors autoregressive + exogenous (ARX) models with weather forecasts.


2. Feature groups & lookback horizons

A. Target history (most important)

Feature Lookback Notes
River temperature 48–168 hrs Thermal inertia is strong; I’d start with 72–120 hrs

Why:

  • Captures diurnal cycles
  • Captures heat storage & advection effects
  • More than ~7 days rarely helps short-term horizons

B. Air temperature (observed + forecast)

Feature Lookback Forecast horizon
Air temperature 48–168 hrs 0–100 hrs

Key driver

  • Strongly correlated but phase-shifted
  • Rivers lag air temp by 2–12 hours depending on depth & flow

👉 I’d explicitly include:

  • Lagged air temp (t−3, t−6, t−12, t−24)
  • Forecasted air temp as known future covariates

C. Solar radiation / shortwave radiation

Feature Lookback Notes
Incoming solar radiation 24–72 hrs Drives diurnal heating

If unavailable:

  • Proxy via clear-sky radiation × cloud cover
  • Or cyclical time encodings

D. Streamflow / discharge (if available)

Feature Lookback Notes
Discharge / flow 72–336 hrs Controls thermal capacity & mixing
  • High flow → dampened temp swings
  • Low flow → stronger diurnal response

I’d include:

  • Raw discharge
  • log(discharge)
  • Rolling mean (24–72 hrs)

E. Meteorology (secondary but helpful)

Feature Lookback
Wind speed 24–72 hrs
Relative humidity 24–72 hrs
Cloud cover 24–72 hrs
Precipitation 24–72 hrs

These improve performance especially during:

  • Storms
  • Heat waves
  • Rapid transitions

F. Temporal encodings (critical)

Feature Type
Hour of day sin/cos
Day of year sin/cos
Day of week optional

Diurnal signal is huge at hourly resolution.


3. Transformations & representations

A. Normalization

  • Per-site z-score or robust scaling
  • Discharge → log transform

B. Rolling statistics

Helpful for shallow models and stability:

  • Rolling mean / std of river temp (6h, 24h, 72h)
  • Rolling temp-air temp difference

C. Anomaly features

  • Temp anomaly vs climatology (same hour, same DOY)
  • Improves generalization across seasons

D. Lag selection

Instead of dense lags:

  • Explicit lags at [1, 3, 6, 12, 24, 48, 72] hours
  • Let deep models learn interactions

4. External data sources (high impact)

Meteorological forecasts (essential)

  • ECMWF / GFS / HRRR
  • Hourly air temperature, radiation, wind, clouds
  • Treated as known future covariates

Hydrologic forecasts (if available)

  • Flow forecasts from HEC-RAS / NOAA / local hydrologic models

Static features

  • River depth, width
  • Upstream distance
  • Shading / canopy
  • Dam presence

Static features are very useful for TFT & TiDE.


5. Model choices (from baseline to SOTA)

1. Strong baselines (always start here)

🔹 SARIMAX

  • AR terms on river temp
  • Exogenous air temperature + radiation
  • Surprisingly strong for short horizons

🔹 Gradient Boosting (XGBoost / LightGBM)

  • Lagged features + weather forecasts
  • Fast, interpretable, robust
  • Often competitive with deep models for <5-day horizons

2. Deep learning models (recommended)

🟢 Temporal Fusion Transformer (TFT) ⭐⭐⭐⭐⭐

My first choice if data volume is sufficient

Why:

  • Explicit handling of:

    • Static features
    • Past observed covariates
    • Known future covariates (weather forecasts)
  • Attention provides interpretability

  • Strong performance on short-term multivariate forecasting

Config:

  • Lookback: 72–168 hrs
  • Horizon: 100 hrs
  • Quantile loss (uncertainty matters in rivers)

🟢 TiDE (Time-series Dense Encoder) ⭐⭐⭐⭐☆

Why:

  • Very strong for short horizons
  • Simpler than TFT
  • Handles future covariates cleanly
  • Excellent when physical relationships are smooth

Less interpretable than TFT but often more stable.


🟢 TSMixer / TSMixer+ ⭐⭐⭐⭐☆

Why:

  • Extremely efficient
  • Strong for high-frequency signals
  • Works well with large lag windows

Caveat:

  • Weaker handling of static features
  • Less physically interpretable

🟡 N-BEATSx

  • Works well if exogenous drivers dominate
  • Less explicit temporal alignment than TFT

3. Hybrid / physics-aware approaches (advanced)

  • Residual model:

    • Physical energy-balance model
    • ML learns residual error
  • Improves extrapolation during extremes


6. Training & validation strategy

Data splitting

  • Rolling origin evaluation
  • Seasonal hold-outs (train on spring/summer → test on heat waves)

Loss functions

  • MAE (operational)
  • Quantile loss (P10 / P50 / P90)
  • Penalize diurnal phase errors

Metrics

  • RMSE
  • Diurnal amplitude error
  • Max daily temperature error (ecologically critical)

7. What I would deploy

Practical SOTA stack:

  1. Primary model: Temporal Fusion Transformer
  2. Backup model: LightGBM with lags
  3. Ensemble: TFT + GBM (often best)

Inputs:

  • 96–168 hrs history
  • 100 hrs of weather forecasts
  • Static river attributes

Outputs:

  • Hourly forecasts
  • Prediction intervals
  • Feature attribution (TFT attention)

Summary (TL;DR)

  • Most influential features:
    River temp history > air temp (obs + forecast) > solar radiation > discharge

  • Lookback:
    72–120 hrs for temp, 48–72 hrs for met variables

  • Best models:

    • 🥇 Temporal Fusion Transformer
    • 🥈 TiDE
    • 🥉 TSMixer
  • Key advantage: known future weather covariates

  • Baseline still matters: GBM & SARIMAX remain strong

If you want, I can:

  • Sketch a TFT feature schema
  • Compare TFT vs TiDE vs TSMixer for river physics
  • Provide PyTorch / Darts / GluonTS configs
  • Discuss cold-start rivers or ungauged basins

Just tell me where you want to go next.

Perplexity

For short-term (≤4 days, hourly) river temperature forecasting, a multivariate deep time series model (e.g. TFT, TSMixer, or TiDE) using recent river conditions plus weather forecasts as exogenous inputs is a strong choice, often outperforming simpler baselines when you have enough data. The most influential features are typically air temperature, recent river temperature history, and discharge/flow, with auxiliary static features (e.g. elevation, shading) improving generalization across sites. [publish.csiro](https://www.publish.csiro.au/mf/pdf/MF24152)

Influential features

Key dynamic features (hourly or aggregated from finer data):

Important static / slowly-varying features:

Lookback windows and feature engineering

For a 1–100 h forecast horizon, a typical design:

Potential transformations:

  • Temporal encoding:

    • Sine/cosine encodings for hour-of-day, day-of-year to encode diurnal and seasonal cycles.
    • Categorical month or season features.
  • Lags and rolling features:

    • Explicit lagged features at multiples of 24 h (e.g. t−24, t−48) to capture repeated diurnal patterns.
    • Rolling mean/max/min/standard deviation (e.g. 6, 12, 24, 72 h) for air/river temperature, flow, and radiation.
  • Nonlinear transforms:

    • Log-transform flow/discharge (often highly skewed).
    • Square or interaction terms if using simpler ML models (e.g. AT², AT×flow) to approximate nonlinear heat processes. [publish.csiro](https://www.publish.csiro.au/mf/pdf/MF24152)
    • Standardization or robust scaling of all continuous features per site or globally.
  • Quality & gap handling:

External data sources

Useful external predictors (both observations and forecasts):

Model classes and concrete algorithms

Baseline and traditional approaches:

Neural sequence models:

Modern SOTA architectures for multivariate time series:

  • Temporal Fusion Transformer (TFT):

    • Handles:
      • Historic observed inputs (past covariates).
      • Known future inputs (e.g. NWP forecasts, calendar).
      • Static features through embeddings and context vectors.
    • Provides attention-based variable selection and interpretable attributions; has shown excellent performance for multi-horizon water temperature forecasting using meteorological forecasts. [hess.copernicus](https://hess.copernicus.org/articles/29/1685/2025/)
  • TSMixer:

    • MLP-Mixer-style model designed for time series, mixing along time and feature dimensions.
    • Strong baseline for multivariate forecasting, computationally efficient, and works well with long lookbacks and horizons.
  • TiDE:

    • Encoder–decoder MLP model specialized for long-horizon forecasting with separate modules for historical and future-known covariates.
    • Good trade-off between accuracy and speed, often competitive with transformers for structured datasets.
  • Others:

    • N-BEATS/N-HiTS: strong general-purpose forecasters; can work well for univariate or low-dimensional settings if exogenous drivers are encoded appropriately.
    • Probabilistic variants (e.g. TFT with quantile loss) to provide prediction intervals, important for operations. [hess.copernicus](https://hess.copernicus.org/articles/29/1685/2025/)

For your problem (hourly, 0–100 h, multivariate drivers):

  • A suitable setup is:
    • Multi-site, multi-variate model.
    • 72–168 h lookback, 100 h forecast horizon.
    • Known future NWP covariates as inputs.
    • Model choice:
      • Start with TFT or TiDE if you need interpretability and handle known future meteorology.
      • TSMixer or TCN as a lighter, strong alternative where compute is constrained.

Practical design choices

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions