GlucoFM Benchmark

Benchmarking Time-Series Foundation Models for Blood Glucose Forecasting

📊 Dataset

This benchmark uses the GlucoFM Dataset available on HuggingFace:

The dataset includes continuous glucose monitoring (CGM) data from multiple public datasets with 80/20 train/test split. Each sample contains:

dataset: Source dataset name
subject_id: Subject identifier
timestamp: Unix timestamp array
BGvalue: Blood glucose values (mg/dL)

🎯 Overview

This repository provides a comprehensive benchmark for evaluating time-series foundation models on glucose forecasting tasks. It includes implementations of multiple state-of-the-art models with zero-shot, few-shot, and full-shot evaluation protocols.

Key Features:

🔄 Multiple Training Paradigms: Zero-shot, few-shot, and full-shot evaluation
📈 Multi-Horizon Prediction: 15min, 30min, 60min, 90min forecasting
🎨 Diverse Model Architectures: Transformer-based, LLM-based, and specialized time-series models
📦 HuggingFace Integration: Easy dataset loading and sharing
🔧 Reproducible Experiments: Documented configurations and training scripts

📁 Repository Structure

GlucoseML_benchmark/
├── 2019Martinsson_et_al_LSTM/     # LSTM baseline (Martinsson et al., 2019)
│   ├── fullshot_lstm.py           # Full-shot training & evaluation
│   ├── fewshot_lstm.py            # Few-shot training & evaluation
│   ├── datasets_loader/           # Dataset loading utilities
│   └── README.md                  # Detailed documentation
│
├── chronos-forecasting/           # Chronos-2 (Amazon)
│   ├── zeroshot.py                # Zero-shot evaluation
│   ├── fewshot.py                 # Few-shot LoRA fine-tuning
│   ├── fullshot.py                # Full-shot LoRA fine-tuning
│   └── chronos.md                 # Implementation guide
│
├── CALF/                          # CALF (Context-Aware Language Foundation)
│   ├── run.py                     # Training & evaluation script
│   ├── prepare_dataset.py         # Dataset preparation
│   ├── pca.py                     # PCA embedding generation
│   └── calf.md                    # Implementation guide
│
├── Time-LLM/                      # Time-LLM (GPT2/LLaMA-based)
│   ├── run_main.py                # Training & evaluation script
│   ├── prepare_dataset.py         # Dataset preparation
│   └── timellm.md                 # Implementation guide
│
├── GPFormer/                      # GPFormer (Graph-based Transformer)
│   ├── predict_glucose_multiwindow_gpformer_fullshot.py
│   ├── predict_glucose_multiwindow_gpformer_fewshot.py
│   └── gpformer.md                # Implementation guide
│
├── timer-model/                   # Timer (Time Series Transformer)
│   ├── predict_glucose_multiwindow_timer_zeroshot.py
│   ├── predict_glucose_multiwindow_timer_fullshot.py
│   ├── predict_glucose_multiwindow_timer_fewshot.py
│   └── timer.md                   # Implementation guide
│
├── timesfm/                       # TimesFM (Google)
│   ├── predict_glucose_multiwindow_timesfm_zeroshot.py
│   ├── predict_glucose_multiwindow_timesfm_fullshot.py
│   ├── predict_glucose_multiwindow_timesfm_fewshot.py
│   └── timesfm.md                 # Implementation guide
│
└── uni2ts/                        # Uni2TS (Moirai)
    ├── predict_glucose_multiwindow_uni2ts_zeroshot.py
    ├── predict_glucose_multiwindow_uni2ts_fullshot.py
    ├── predict_glucose_multiwindow_uni2ts_fewshot.py
    └── moirai.md                  # Implementation guide

🚀 Quick Start

1. Clone the Repository

git clone git@github.com:Augmented-Health-Lab/GlucoseML_benchmark.git
cd GlucoseML_benchmark

2. Install Dependencies

Each model has its own requirements. Navigate to the specific model directory and install dependencies:

cd <model_directory>
pip install -r requirements.txt

3. Prepare Dataset

Option 1 (recommended): prepare from the HuggingFace dataset into a local CSV cache (hf_cache/).

pip install datasets

# Basic preparation
python prepare_dataset.py --hf-name byluuu/gluco-tsfm-benchmark --output-dir ./hf_cache

# With mixed dataset (combines all subdatasets)
python prepare_dataset.py --hf-name byluuu/gluco-tsfm-benchmark --output-dir ./hf_cache --create-mixed

This will create (when --create-mixed is used):

hf_cache/
├── train/
│   ├── <DATASET_NAME>/
│   │   ├── <SUBJECT_ID>.csv
│   │   └── all
│   └── mixed/                 # All training data combined
│       ├── <DATASET>__<SUBJECT_ID>.csv
│       └── all
└── test/
    ├── <DATASET_NAME>/
    │   └── <SUBJECT_ID>.csv
    └── mixed/                 # All test data combined
        └── <DATASET>__<SUBJECT_ID>.csv

GPFormer, Timer, TimesFM, Uni2TS, TimerLLM and CALF scripts default to reading from hf_cache/.

Option 2: load directly from HuggingFace (no CSV export) by passing --data-source hf (requires datasets).

4. Run Experiments

See individual model documentation for specific commands.

📚 Model Documentation

Traditional Baselines

Model	Type	Documentation	Key Features
Martinsson LSTM	LSTM	README.md	Variance estimation, NLL loss, OhioT1DM baseline
GPFormer	Transformer	gpformer.md	Multi-window prediction

Foundation Models (LLM-based)

Model	Base LLM	Documentation	Key Features
Time-LLM	GPT2/LLaMA	timellm.md	LLM reprogramming, time series adaptation
CALF	GPT2	calf.md	Cross-modal fine-tuning, PCA embeddings

Foundation Models (Transformer-based)

Model	Architecture	Documentation	Key Features
Chronos-2	Encoder-Decoder	chronos.md	LoRA fine-tuning, Amazon pretrained
Timer	Transformer-Decoder	timer.md	Efficient time series modeling
TimesFM	Transformer-Decoder	timesfm.md	Google pretrained
Uni2TS (Moirai2.0)	Transformer-Decoder	moirai.md	Universal time series model

🔬 Evaluation Protocols

Zero-Shot Evaluation

Evaluate pretrained models without any training on glucose data.

Supported Models: Chronos, Timer, TimesFM, Uni2TS

Example:

cd chronos-forecasting
python zeroshot.py --split test --prediction_length 18

Few-Shot Evaluation

Train with limited data (e.g., 1 sample per 20 hours).

Supported Models: All models

Example:

cd chronos-forecasting
python fewshot.py --train_stride 240 --prediction_length 18

Full-Shot Evaluation

Train with full training dataset.

Supported Models: All models

Example:

cd chronos-forecasting
python fullshot.py --train_stride 12 --prediction_length 18

📊 Prediction Horizons

All models support multiple prediction horizons:

Horizon	Timesteps	Duration	Use Case
15 min	3 steps	3 × 5min	Immediate alerts
30 min	6 steps	6 × 5min	Short-term planning
60 min	12 steps	12 × 5min	Meal/exercise planning
90 min	18 steps	18 × 5min	Extended prediction

Note: All models use 5-minute sampling frequency (standard for CGM devices).

🛠️ Common Configuration

Context Length

Most models use 144 timesteps (12 hours) as default context:

144 × 5min = 720 minutes = 12 hours

Prediction Lengths

pred_len=3: 15 minutes
pred_len=6: 30 minutes
pred_len=12: 60 minutes
pred_len=18: 90 minutes

Training Strategies

Full-shot stride: 12 steps (1 hour) - dense sampling
Few-shot stride: 240 steps (20 hours) - sparse sampling
Evaluation stride: 1-3 steps (5-15 minutes) - dense prediction

💡 Usage Tips

Start with Zero-Shot: Test pretrained models before fine-tuning
Memory Management: Reduce batch size or use gradient accumulation for OOM errors
Multi-Horizon Training: Train once at longest horizon, evaluate at all horizons
Dataset-Specific Testing: Use test_root_path to evaluate on specific datasets
HuggingFace Integration: Most models support automatic dataset loading

🤝 Contributing

Contributions are welcome! To add a new model:

Create a new directory with model name
Add implementation scripts (zeroshot/fewshot/fullshot)
Create a <model>.md documentation file
Update this README with model information
Add results to paper_tables_ctx12h_hor30m/

📝 Citation

If you use this benchmark in your research, please cite:

@article{glucofm_benchmark2024,
  title={GlucoFM-Bench: Benchmarking Time-Series Foundation Models for Blood Glucose Forecasting},
  author={Lu, Baiying and Liang, Zhaohui and Pontius, Ryan and Tang, Shengpu and Prioleau, Temiloluwa},
  journal={Under submission},
  year={2026}
}

Individual Model Citations

Martinsson LSTM:

@article{martinsson2020blood,
  title={Blood Glucose Prediction with Variance Estimation Using Recurrent Neural Networks},
  author={Martinsson, John and Schliep, Alexander and Eliasson, Bj{\"o}rn and Meijner, Claes and Persson, Simon and Mogren, Olof},
  journal={Journal of Healthcare Informatics Research},
  volume={4},
  pages={1--18},
  year={2020}
}

Time-LLM:

@article{jin2023time,
  title={Time-LLM: Time Series Forecasting by Reprogramming Large Language Models},
  author={Jin, Ming and Wang, Shiyu and Ma, Lintao and Chu, Zhixuan and Zhang, James Y and Shi, Xiaoming and Chen, Pin-Yu and Liang, Yuxuan and Li, Yuan-Fang and Pan, Shirui and others},
  journal={arXiv preprint arXiv:2310.01728},
  year={2023}
}

CALF:

@article{liu2024calf,
  title={CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning},
  author={Liu, Peiyuan and Zhao, Hang and Li, Tao and others},
  journal={arXiv preprint arXiv:2403.07300},
  year={2024}
}

Chronos:

@article{ansari2024chronos,
  title={Chronos: Learning the Language of Time Series},
  author={Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Sundar and Arango, Sebastian Pineda and Kapoor, Shubham and others},
  journal={arXiv preprint arXiv:2403.07815},
  year={2024}
}
@misc{ansari2025chronos2,
      title={Chronos-2: From Univariate to Universal Forecasting}, 
      author={Abdul Fatir Ansari and Oleksandr Shchur and Jaris Küken and Andreas Auer and Boran Han and Pedro Mercado and Syama Sundar Rangapuram and Huibin Shen and Lorenzo Stella and Xiyuan Zhang and Mononito Goswami and Shubham Kapoor and Danielle C. Maddix and Pablo Guerron and Tony Hu and Junming Yin and Nick Erickson and Prateek Mutalik Desai and Hao Wang and Huzefa Rangwala and George Karypis and Yuyang Wang and Michael Bohlke-Schneider},
      year={2025},
      eprint={2510.15821},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.15821}, 
}

Timer:

@inproceedings{liutimer,
  title={Timer: Generative Pre-trained Transformers Are Large Time Series Models},
  author={Liu, Yong and Zhang, Haoran and Li, Chenyu and Huang, Xiangdong and Wang, Jianmin and Long, Mingsheng},
  booktitle={Forty-first International Conference on Machine Learning}
}

@article{liu2024timer,
  title={Timer-XL: Long-Context Transformers for Unified Time Series Forecasting},
  author={Liu, Yong and Qin, Guo and Huang, Xiangdong and Wang, Jianmin and Long, Mingsheng},
  journal={arXiv preprint arXiv:2410.04803},
  year={2024}
}

TimesFM:

@misc{das2024decoderonlyfoundationmodeltimeseries,
      title={A decoder-only foundation model for time-series forecasting}, 
      author={Abhimanyu Das and Weihao Kong and Rajat Sen and Yichen Zhou},
      year={2024},
      eprint={2310.10688},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2310.10688}, 
}

Moirai:

@misc{woo2024unifiedtraininguniversaltime,
      title={Unified Training of Universal Time Series Forecasting Transformers}, 
      author={Gerald Woo and Chenghao Liu and Akshat Kumar and Caiming Xiong and Silvio Savarese and Doyen Sahoo},
      year={2024},
      eprint={2402.02592},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2402.02592}, 
}
@misc{liu2026moirai20timeseries,
      title={Moirai 2.0: When Less Is More for Time Series Forecasting}, 
      author={Chenghao Liu and Taha Aksu and Juncheng Liu and Xu Liu and Hanshu Yan and Quang Pham and Silvio Savarese and Doyen Sahoo and Caiming Xiong and Junnan Li},
      year={2026},
      eprint={2511.11698},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.11698}, 
}

GPFormer

@article{Zhu2025,
   author = {Taiyu Zhu and Ioannis Afentakis and Kezhi Li and Ryan Armiger and Neil Hill and Nick Oliver and Pantelis Georgiou},
   doi = {10.1109/JBHI.2024.3428921},
   issn = {21682208},
   issue = {8},
   journal = {IEEE Journal of Biomedical and Health Informatics},
   keywords = {Deep learning,Transformer,diabetes,domain generalization,glucose prediction},
   pages = {5424-5437},
   pmid = {39012743},
   publisher = {Institute of Electrical and Electronics Engineers Inc.},
   title = {Multi-Horizon Glucose Prediction Across Populations With Deep Domain Generalization},
   volume = {29},
   year = {2025}
}

📧 Contact

For questions or issues, please open an issue on GitHub or contact the maintainers.

📄 License

This project is licensed under the MIT License - see individual model directories for specific licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GlucoFM Benchmark

📊 Dataset

🎯 Overview

📁 Repository Structure

🚀 Quick Start

1. Clone the Repository

2. Install Dependencies

3. Prepare Dataset

4. Run Experiments

📚 Model Documentation

Traditional Baselines

Foundation Models (LLM-based)

Foundation Models (Transformer-based)

🔬 Evaluation Protocols

Zero-Shot Evaluation

Few-Shot Evaluation

Full-Shot Evaluation

📊 Prediction Horizons

🛠️ Common Configuration

Context Length

Prediction Lengths

Training Strategies

💡 Usage Tips

🤝 Contributing

📝 Citation

Individual Model Citations

📧 Contact

📄 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
2019Martinsson_et_al_LSTM		2019Martinsson_et_al_LSTM
CALF		CALF
GPFormer		GPFormer
Time-LLM		Time-LLM
__pycache__		__pycache__
chronos-forecasting		chronos-forecasting
timer-model		timer-model
timesfm		timesfm
uni2ts		uni2ts
.gitignore		.gitignore
README.md		README.md
glucofm_data.py		glucofm_data.py
prepare_dataset.py		prepare_dataset.py
results_cache.py		results_cache.py

Augmented-Health-Lab/GlucoseML_benchmark

Folders and files

Latest commit

History

Repository files navigation

GlucoFM Benchmark

📊 Dataset

🎯 Overview

📁 Repository Structure

🚀 Quick Start

1. Clone the Repository

2. Install Dependencies

3. Prepare Dataset

4. Run Experiments

📚 Model Documentation

Traditional Baselines

Foundation Models (LLM-based)

Foundation Models (Transformer-based)

🔬 Evaluation Protocols

Zero-Shot Evaluation

Few-Shot Evaluation

Full-Shot Evaluation

📊 Prediction Horizons

🛠️ Common Configuration

Context Length

Prediction Lengths

Training Strategies

💡 Usage Tips

🤝 Contributing

📝 Citation

Individual Model Citations

📧 Contact

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages