Time-Series Foundation Model with Temporal Attention and Multi-Task Learning
This project implements a Hybrid Time-Series Anomaly Forecasting Model for HVAC equipment, combining IBM Granite Time-Series Foundation Model (TinyTimeMixer) with statistical feature engineering and temporal attention mechanisms.
Key Innovation: Multi-task learning for simultaneous 30-day, 60-day, and 90-day anomaly prediction with production-ready performance (F1=0.2789).
Predict HVAC equipment anomalies 30, 60, and 90 days in advance using:
- Historical time-series data (90-day lookback)
- Statistical features (28 dimensions)
- Temporal attention for pattern recognition
- Multi-task learning for efficient shared representation
Status: ✅ PRODUCTION READY
| Horizon | F1-Score | ROC-AUC | Precision | Recall | Accuracy |
|---|---|---|---|---|---|
| 30d | 0.2771 | 0.7230 | 0.2404 | 0.3257 | 0.8318 |
| 60d | 0.2756 | 0.6981 | 0.2435 | 0.3173 | 0.8349 |
| 90d | 0.2840 | 0.6958 | 0.2634 | 0.3101 | 0.8486 |
| Average | 0.2789 | 0.7056 | 0.2491 | 0.3177 | 0.8384 |
Model Size: 195,204 parameters (31.64% trainable)
┌─────────────────────────────────────────────────────────────┐
│ Input Layer │
│ Time-Series [90 days] + Statistical Features [28 dims] │
└─────────────────────┬───────────────────────────────────────┘
│
┌───────┴────────┐
│ │
┌─────────▼──────┐ ┌────▼──────────┐
│ TinyTimeMixer │ │ Statistical │
│ Encoder │ │ Features │
│ [90,1]→[64] │ │ [28] │
│ (Frozen) │ │ │
└─────────┬──────┘ └────┬───────────┘
│ │
│ │
┌─────────▼──────────┐ │
│ Temporal Attention │ │
│ (4 heads) │ │
│ [64]→[64] │ │
└─────────┬──────────┘ │
│ │
└───────┬───────┘
│
┌─────────▼──────────┐
│ Simple Concat │
│ Fusion Layer │
│ [64+28]→[92] │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ Shared Hidden │
│ [92]→[128] │
└─────────┬──────────┘
│
┌─────────────┼─────────────┐
│ │ │
┌─────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
│ 30d Head │ │ 60d Head │ │ 90d Head │
│ [128]→[1] │ │[128]→[1] │ │[128]→[1] │
└───────────┘ └──────────┘ └──────────┘
-
TinyTimeMixer Encoder (Frozen)
- IBM Granite Time-Series Foundation Model
- Pre-trained on large-scale time-series data
- Extracts 64-dimensional embeddings
-
Temporal Attention (Trainable)
- Multi-head self-attention (4 heads)
- Captures important temporal patterns
- Enhances interpretability
-
Statistical Features (28 dimensions)
- Time-series statistics (mean, std, trend, etc.)
- Recent behavior indicators
- Seasonal & autocorrelation features
- Distribution features (skewness, kurtosis, entropy)
-
Simple Concat Fusion
- Efficient feature integration
- Proven superior to complex cross-attention (v2.2.2 experiment)
-
Multi-Task Learning
- Simultaneous 30d/60d/90d prediction
- Shared representation learning
- Focal Loss for class imbalance
| Model | Architecture | Avg F1 | ROC-AUC | Parameters | Status |
|---|---|---|---|---|---|
| v2.2 | Simple Concat Fusion | 0.2789 | 0.7056 | 195,204 | ✅ PRODUCTION |
| v2.2.2 | Cross-Attention Fusion | 0.2689 | 0.6942 | 218,952 | Experimental |
| v2.3 | No Statistical Features | 0.1973 | 0.5874 | 191,620 | Experimental |
- ✅ Statistical Features are Essential - Removing them caused -29.2% F1 drop (v2.3)
- ✅ Simple Fusion > Complex Fusion - Cross-Attention underperformed (-3.6% F1) despite +12% params (v2.2.2)
- ✅ Feature Engineering > Model Complexity - Good features matter more than architecture
- ✅ Optimal Parameter Count - v2.2's 195K params is the sweet spot
See Hybrid_v2.2_v2.3_Lesson.md for detailed analysis.
Python 3.10+
PyTorch 2.6.0+
CUDA 12.4+ (for GPU training)
NVIDIA GPU with 16GB+ VRAM (RTX 4060 Ti or better)# Clone repository
git clone https://github.com/yourusername/tsfm_attension_multitask.git
cd tsfm_attension_multitask
# Create virtual environment
python -m venv venv
.\venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt# Train v2.2 (Production Model)
python train_hybrid_model_v2_2.py
# Training takes ~25 epochs on RTX 4060 Ti (16GB)
# Model saved to: models/hybrid_model_v2.2/pytorch_model_multitask.ptimport torch
from pathlib import Path
from train_hybrid_model_v2_2 import MultiTaskHybridModel
from granite_ts_model import GraniteTimeSeriesClassifier
# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_path = Path("models/hybrid_model_v2.2/pytorch_model_multitask.pt")
# ... (see models/README_PRODUCTION.md for complete code)# Visualize v2.2 predictions
python visualize_forecast_v2_2.py
# Output: results/forecast_comparison_v2.2_[timestamp].pngtsfm_attension_multitask/
├── train_hybrid_model_v2_2.py # Production model training ✅
├── train_hybrid_model_v2_2_2.py # Cross-Attention experiment
├── train_hybrid_model_v2_3.py # No-features experiment
├── visualize_forecast_v2_2.py # v2.2 visualization
├── visualize_forecast_v2_3.py # v2.3 visualization
├── granite_ts_model.py # TinyTimeMixer wrapper
├── config.py # Configuration
├── requirements.txt # Dependencies
├── Hybrid_v2.2_v2.3_Lesson.md # Detailed analysis & lessons
├── models/
│ ├── README_PRODUCTION.md # Production model guide
│ ├── hybrid_model_v2.2/ # v2.2 model ✅
│ ├── hybrid_model_v2.2.2/ # v2.2.2 model
│ └── hybrid_model_v2.3/ # v2.3 model
├── results/
│ ├── training_history_v2.2.json
│ ├── training_history_v2.2.2.json
│ └── training_history_v2.3.json
└── data/
└── processed/
├── training_samples_enriched.csv
└── test_samples_enriched.csv
- Simple Concat Fusion: Embeddings [64] + Features [28] → [92]
- Result: Best performance (F1=0.2789)
- Lesson: Simplicity wins when features are well-engineered
- Cross-Attention Fusion: Bidirectional attention between embeddings and features
- Result: Worse performance (F1=0.2689, -3.6%)
- Lesson: Complex attention overkill for 28-dimensional features
- No Statistical Features: TinyTimeMixer embeddings only
- Result: Severe degradation (F1=0.1973, -29.2%)
- Lesson: Domain-specific statistical features are essential
See Hybrid_v2.2_v2.3_Lesson.md for comprehensive analysis.
Statistical features (mean, std, trend, seasonality) provide explicit domain knowledge that deep learning alone cannot capture.
- 28-dimensional features → Simple Concat Fusion ✅
- 100+ dimensional features → Cross-Attention might help
- Complexity ≠ Better Performance
- Pre-trained TinyTimeMixer (frozen) provides strong time-series representations
- Statistical features add HVAC-specific anomaly detection logic
- Combination is more powerful than either alone
- Shared encoder learns common patterns across horizons
- Parameter efficiency (1 model vs 3 separate models)
- Improved generalization through implicit regularization
- Foundation Model: IBM Granite Time-Series (TinyTimeMixer)
- Framework: PyTorch 2.6.0
- Hardware: NVIDIA RTX 4060 Ti (16GB VRAM)
- Loss Function: Focal Loss (γ=3.0) for class imbalance
- Optimizer: AdamW with Cosine Annealing
- Attention: Multi-Head Self-Attention (4 heads)
{
"epochs": 25,
"batch_size": 128,
"learning_rate": 5e-4,
"weight_decay": 0.01,
"focal_gamma": 3.0,
"embed_dim": 64,
"hidden_dim": 128,
"num_attention_heads": 4,
"dropout": 0.3,
"lookback_days": 90
}- ~25 epochs on NVIDIA RTX 4060 Ti (16GB)
- Convergence: Epoch 25 (Best F1: 0.2789)
- Total training time: ~2-3 hours
- Training samples: 58,300
- Test samples: 8,745
- Time-series length: 90 days
- Statistical features: 28 dimensions
- Anomaly rate: ~9% (class imbalance handled by Focal Loss)
- Production Guide: models/README_PRODUCTION.md
- Lessons Learned: Hybrid_v2.2_v2.3_Lesson.md
- Training Logs:
results/training_history_v2.2.json
-
Feature Selection
- Reduce 28d → 15-20d using SHAP analysis
- Maintain performance while improving efficiency
-
Model Optimization
- ONNX export for production deployment
- Quantization for faster inference
- REST API wrapper
-
Extended Horizons
- 120-day, 180-day forecasts
- Hierarchical multi-task learning
-
Transfer to Other Domains
- Apply to other equipment types
- Generalize to industrial IoT anomaly detection
(Run python visualize_forecast_v2_2.py to generate)
Contributions welcome! Please:
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project is licensed under the MIT License.
- IBM Granite Time-Series Team - TinyTimeMixer foundation model
- HuggingFace - Transformers library
- PyTorch Team - Deep learning framework
For questions or collaboration:
- Open an issue on GitHub
- Check documentation in models/README_PRODUCTION.md
Date: 2026-02-15
Status: ✅ Production Ready
Version: v2.2 (Confirmed)
"The best model is not the most complex, nor the simplest, but the one that captures the right information with the right architecture."
