Benchmarking Time-Series Foundation Models for Blood Glucose Forecasting
This benchmark uses the GlucoFM Dataset available on HuggingFace:
π byluuu/gluco-tsfm-benchmark
The dataset includes continuous glucose monitoring (CGM) data from multiple public datasets with 80/20 train/test split. Each sample contains:
dataset: Source dataset namesubject_id: Subject identifiertimestamp: Unix timestamp arrayBGvalue: Blood glucose values (mg/dL)
This repository provides a comprehensive benchmark for evaluating time-series foundation models on glucose forecasting tasks. It includes implementations of multiple state-of-the-art models with zero-shot, few-shot, and full-shot evaluation protocols.
Key Features:
- π Multiple Training Paradigms: Zero-shot, few-shot, and full-shot evaluation
- π Multi-Horizon Prediction: 15min, 30min, 60min, 90min forecasting
- π¨ Diverse Model Architectures: Transformer-based, LLM-based, and specialized time-series models
- π¦ HuggingFace Integration: Easy dataset loading and sharing
- π§ Reproducible Experiments: Documented configurations and training scripts
GlucoseML_benchmark/
βββ 2019Martinsson_et_al_LSTM/ # LSTM baseline (Martinsson et al., 2019)
β βββ fullshot_lstm.py # Full-shot training & evaluation
β βββ fewshot_lstm.py # Few-shot training & evaluation
β βββ datasets_loader/ # Dataset loading utilities
β βββ README.md # Detailed documentation
β
βββ chronos-forecasting/ # Chronos-2 (Amazon)
β βββ zeroshot.py # Zero-shot evaluation
β βββ fewshot.py # Few-shot LoRA fine-tuning
β βββ fullshot.py # Full-shot LoRA fine-tuning
β βββ chronos.md # Implementation guide
β
βββ CALF/ # CALF (Context-Aware Language Foundation)
β βββ run.py # Training & evaluation script
β βββ prepare_dataset.py # Dataset preparation
β βββ pca.py # PCA embedding generation
β βββ calf.md # Implementation guide
β
βββ Time-LLM/ # Time-LLM (GPT2/LLaMA-based)
β βββ run_main.py # Training & evaluation script
β βββ prepare_dataset.py # Dataset preparation
β βββ timellm.md # Implementation guide
β
βββ GPFormer/ # GPFormer (Graph-based Transformer)
β βββ predict_glucose_multiwindow_gpformer_fullshot.py
β βββ predict_glucose_multiwindow_gpformer_fewshot.py
β βββ gpformer.md # Implementation guide
β
βββ timer-model/ # Timer (Time Series Transformer)
β βββ predict_glucose_multiwindow_timer_zeroshot.py
β βββ predict_glucose_multiwindow_timer_fullshot.py
β βββ predict_glucose_multiwindow_timer_fewshot.py
β βββ timer.md # Implementation guide
β
βββ timesfm/ # TimesFM (Google)
β βββ predict_glucose_multiwindow_timesfm_zeroshot.py
β βββ predict_glucose_multiwindow_timesfm_fullshot.py
β βββ predict_glucose_multiwindow_timesfm_fewshot.py
β βββ timesfm.md # Implementation guide
β
βββ uni2ts/ # Uni2TS (Moirai)
βββ predict_glucose_multiwindow_uni2ts_zeroshot.py
βββ predict_glucose_multiwindow_uni2ts_fullshot.py
βββ predict_glucose_multiwindow_uni2ts_fewshot.py
βββ moirai.md # Implementation guide
git clone git@github.com:Augmented-Health-Lab/GlucoseML_benchmark.git
cd GlucoseML_benchmarkEach model has its own requirements. Navigate to the specific model directory and install dependencies:
cd <model_directory>
pip install -r requirements.txtOption 1 (recommended): prepare from the HuggingFace dataset into a local CSV cache (hf_cache/).
pip install datasets
# Basic preparation
python prepare_dataset.py --hf-name byluuu/gluco-tsfm-benchmark --output-dir ./hf_cache
# With mixed dataset (combines all subdatasets)
python prepare_dataset.py --hf-name byluuu/gluco-tsfm-benchmark --output-dir ./hf_cache --create-mixedThis will create (when --create-mixed is used):
hf_cache/
βββ train/
β βββ <DATASET_NAME>/
β β βββ <SUBJECT_ID>.csv
β β βββ all
β βββ mixed/ # All training data combined
β βββ <DATASET>__<SUBJECT_ID>.csv
β βββ all
βββ test/
βββ <DATASET_NAME>/
β βββ <SUBJECT_ID>.csv
βββ mixed/ # All test data combined
βββ <DATASET>__<SUBJECT_ID>.csv
GPFormer, Timer, TimesFM, Uni2TS, TimerLLM and CALF scripts default to reading from hf_cache/.
Option 2: load directly from HuggingFace (no CSV export) by passing --data-source hf (requires datasets).
See individual model documentation for specific commands.
| Model | Type | Documentation | Key Features |
|---|---|---|---|
| Martinsson LSTM | LSTM | README.md | Variance estimation, NLL loss, OhioT1DM baseline |
| GPFormer | Transformer | gpformer.md | Multi-window prediction |
| Model | Base LLM | Documentation | Key Features |
|---|---|---|---|
| Time-LLM | GPT2/LLaMA | timellm.md | LLM reprogramming, time series adaptation |
| CALF | GPT2 | calf.md | Cross-modal fine-tuning, PCA embeddings |
| Model | Architecture | Documentation | Key Features |
|---|---|---|---|
| Chronos-2 | Encoder-Decoder | chronos.md | LoRA fine-tuning, Amazon pretrained |
| Timer | Transformer-Decoder | timer.md | Efficient time series modeling |
| TimesFM | Transformer-Decoder | timesfm.md | Google pretrained |
| Uni2TS (Moirai2.0) | Transformer-Decoder | moirai.md | Universal time series model |
Evaluate pretrained models without any training on glucose data.
Supported Models: Chronos, Timer, TimesFM, Uni2TS
Example:
cd chronos-forecasting
python zeroshot.py --split test --prediction_length 18Train with limited data (e.g., 1 sample per 20 hours).
Supported Models: All models
Example:
cd chronos-forecasting
python fewshot.py --train_stride 240 --prediction_length 18Train with full training dataset.
Supported Models: All models
Example:
cd chronos-forecasting
python fullshot.py --train_stride 12 --prediction_length 18All models support multiple prediction horizons:
| Horizon | Timesteps | Duration | Use Case |
|---|---|---|---|
| 15 min | 3 steps | 3 Γ 5min | Immediate alerts |
| 30 min | 6 steps | 6 Γ 5min | Short-term planning |
| 60 min | 12 steps | 12 Γ 5min | Meal/exercise planning |
| 90 min | 18 steps | 18 Γ 5min | Extended prediction |
Note: All models use 5-minute sampling frequency (standard for CGM devices).
Most models use 144 timesteps (12 hours) as default context:
- 144 Γ 5min = 720 minutes = 12 hours
pred_len=3: 15 minutespred_len=6: 30 minutespred_len=12: 60 minutespred_len=18: 90 minutes
- Full-shot stride: 12 steps (1 hour) - dense sampling
- Few-shot stride: 240 steps (20 hours) - sparse sampling
- Evaluation stride: 1-3 steps (5-15 minutes) - dense prediction
- Start with Zero-Shot: Test pretrained models before fine-tuning
- Memory Management: Reduce batch size or use gradient accumulation for OOM errors
- Multi-Horizon Training: Train once at longest horizon, evaluate at all horizons
- Dataset-Specific Testing: Use test_root_path to evaluate on specific datasets
- HuggingFace Integration: Most models support automatic dataset loading
Contributions are welcome! To add a new model:
- Create a new directory with model name
- Add implementation scripts (zeroshot/fewshot/fullshot)
- Create a
<model>.mddocumentation file - Update this README with model information
- Add results to
paper_tables_ctx12h_hor30m/
If you use this benchmark in your research, please cite:
@article{glucofm_benchmark2024,
title={GlucoFM-Bench: Benchmarking Time-Series Foundation Models for Blood Glucose Forecasting},
author={Lu, Baiying and Liang, Zhaohui and Pontius, Ryan and Tang, Shengpu and Prioleau, Temiloluwa},
journal={Under submission},
year={2026}
}Martinsson LSTM:
@article{martinsson2020blood,
title={Blood Glucose Prediction with Variance Estimation Using Recurrent Neural Networks},
author={Martinsson, John and Schliep, Alexander and Eliasson, Bj{\"o}rn and Meijner, Claes and Persson, Simon and Mogren, Olof},
journal={Journal of Healthcare Informatics Research},
volume={4},
pages={1--18},
year={2020}
}Time-LLM:
@article{jin2023time,
title={Time-LLM: Time Series Forecasting by Reprogramming Large Language Models},
author={Jin, Ming and Wang, Shiyu and Ma, Lintao and Chu, Zhixuan and Zhang, James Y and Shi, Xiaoming and Chen, Pin-Yu and Liang, Yuxuan and Li, Yuan-Fang and Pan, Shirui and others},
journal={arXiv preprint arXiv:2310.01728},
year={2023}
}CALF:
@article{liu2024calf,
title={CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning},
author={Liu, Peiyuan and Zhao, Hang and Li, Tao and others},
journal={arXiv preprint arXiv:2403.07300},
year={2024}
}Chronos:
@article{ansari2024chronos,
title={Chronos: Learning the Language of Time Series},
author={Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Sundar and Arango, Sebastian Pineda and Kapoor, Shubham and others},
journal={arXiv preprint arXiv:2403.07815},
year={2024}
}
@misc{ansari2025chronos2,
title={Chronos-2: From Univariate to Universal Forecasting},
author={Abdul Fatir Ansari and Oleksandr Shchur and Jaris KΓΌken and Andreas Auer and Boran Han and Pedro Mercado and Syama Sundar Rangapuram and Huibin Shen and Lorenzo Stella and Xiyuan Zhang and Mononito Goswami and Shubham Kapoor and Danielle C. Maddix and Pablo Guerron and Tony Hu and Junming Yin and Nick Erickson and Prateek Mutalik Desai and Hao Wang and Huzefa Rangwala and George Karypis and Yuyang Wang and Michael Bohlke-Schneider},
year={2025},
eprint={2510.15821},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.15821},
}Timer:
@inproceedings{liutimer,
title={Timer: Generative Pre-trained Transformers Are Large Time Series Models},
author={Liu, Yong and Zhang, Haoran and Li, Chenyu and Huang, Xiangdong and Wang, Jianmin and Long, Mingsheng},
booktitle={Forty-first International Conference on Machine Learning}
}
@article{liu2024timer,
title={Timer-XL: Long-Context Transformers for Unified Time Series Forecasting},
author={Liu, Yong and Qin, Guo and Huang, Xiangdong and Wang, Jianmin and Long, Mingsheng},
journal={arXiv preprint arXiv:2410.04803},
year={2024}
}TimesFM:
@misc{das2024decoderonlyfoundationmodeltimeseries,
title={A decoder-only foundation model for time-series forecasting},
author={Abhimanyu Das and Weihao Kong and Rajat Sen and Yichen Zhou},
year={2024},
eprint={2310.10688},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2310.10688},
}Moirai:
@misc{woo2024unifiedtraininguniversaltime,
title={Unified Training of Universal Time Series Forecasting Transformers},
author={Gerald Woo and Chenghao Liu and Akshat Kumar and Caiming Xiong and Silvio Savarese and Doyen Sahoo},
year={2024},
eprint={2402.02592},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2402.02592},
}
@misc{liu2026moirai20timeseries,
title={Moirai 2.0: When Less Is More for Time Series Forecasting},
author={Chenghao Liu and Taha Aksu and Juncheng Liu and Xu Liu and Hanshu Yan and Quang Pham and Silvio Savarese and Doyen Sahoo and Caiming Xiong and Junnan Li},
year={2026},
eprint={2511.11698},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2511.11698},
}GPFormer
@article{Zhu2025,
author = {Taiyu Zhu and Ioannis Afentakis and Kezhi Li and Ryan Armiger and Neil Hill and Nick Oliver and Pantelis Georgiou},
doi = {10.1109/JBHI.2024.3428921},
issn = {21682208},
issue = {8},
journal = {IEEE Journal of Biomedical and Health Informatics},
keywords = {Deep learning,Transformer,diabetes,domain generalization,glucose prediction},
pages = {5424-5437},
pmid = {39012743},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
title = {Multi-Horizon Glucose Prediction Across Populations With Deep Domain Generalization},
volume = {29},
year = {2025}
}For questions or issues, please open an issue on GitHub or contact the maintainers.
This project is licensed under the MIT License - see individual model directories for specific licenses.