This repository contains the official implementation for the paper: "A predictive approach to enhance time-series forecasting".
Future-Guided Learning (FGL) is a novel approach that enhances time-series event forecasting using a dynamic feedback mechanism inspired by predictive coding. FGL utilizes a "teacher" model that analyzes future data to identify key events and a "student" model that learns to predict these events from current data. By minimizing the discrepancy between the teacher's near-future insights and the student's long-horizon predictions, FGL allows the forecasting model to dynamically adapt and improve its accuracy.
Our experiments demonstrate that FGL significantly improves forecasting performance across different domains:
- EEG-based Seizure Prediction: Achieved a 44.8% average increase in AUC-ROC on the CHBMIT dataset.
- Nonlinear Dynamical Systems: Reduced Mean Squared Error (MSE) by 23.4% in forecasting the Mackey-Glass time series (outlier excluded).
Figure 3: Overview of FGL and its applications. (Click to expand)
A In the FGL framework, a teacher model operates in the relative future of a student model that focuses on long-term forecasting. After training the teacher on its future-oriented task, both models perform inference during the student’s training phase. The probability distributions from the teacher and student are extracted, and a loss is computed based on Eq. (1) .A1 Knowledge distillation transfers information via the Kullback–Leibler (KL) divergence between class distributions. B In an event prediction setting, the teacher is trained directly on the events themselves, while the student is trained to forecast these events. Future labels are distilled from the teacher to the student, guiding the student to align more closely with the teacher model’s predictions, despite using data from the relative past. C In a regression forecasting scenario, the teacher and student perform short-term and long-term forecasting, respectively. Similar to event prediction, the student gains insights from the teacher during training, enhancing its ability to predict further into the future.The experiments were conducted using Python 3.11.7 and CUDA 12.1.
-
Clone the repository:
git clone [https://github.com/your-username/FutureGuidedLearning.git](https://github.com/your-username/FutureGuidedLearning.git) cd FutureGuidedLearning -
Install dependencies: All required packages are listed in
requirements.txt.pip install -r requirements.txt
This repository supports three datasets. For the EEG datasets, please download the data and place the patient folders inside a Dataset/ directory at the root of this project.
- Description: Intracranial EEG (iEEG) recordings from 5 dogs and 2 human patients.
- Source: Kaggle Seizure Prediction Challenge.
- Setup: Download and place the patient data folders into the
Dataset/directory.
- Description: Intracranial EEG recordings from 23 pediatric patients. We train only on select patients with sufficient preictal data.
- Source: PhysioNet.
- Setup: Download and place the patient data inside the
Dataset/folder.
- Description: A synthetic time-series dataset generated from the Mackey-Glass delay differential equation, used for controlled forecasting experiments.
- Setup: This dataset is generated automatically by the scripts.
To fully reproduce the results from the paper, follow the three main stages below: training the teacher models, training the baseline models (without FGL), and finally, training the student models with FGL.
Teacher models are trained for seizure detection (identifying a seizure as it happens) and serve as the source of "future" knowledge.
-
For CHB-MIT (Patient-Specific Teachers): Train a separate detection model for each patient you wish to evaluate.
# Example for Patient 1. Repeat for other patients (e.g., 3, 5, 10, etc.). python seizure_detection.py --patient 1 --epochs 50 --patience 5 -
For AES (Universal Teachers): The AES dataset lacks seizure labels for training, so we create two "universal" teacher models using an external dataset: one for dogs and one for humans.
# This script trains and saves both the dog and human universal teachers. python create_teacher.py --epochs 50
These models are trained for seizure prediction without any guidance from a teacher. This provides the baseline performance against which FGL is compared.
- For CHB-MIT & AES:
The
seizure_prediction.pyscript is used to train the baseline models.# Example for a CHB-MIT patient python seizure_prediction.py --subject Patient_1 --model MViT --trials 3 # Example for an AES subject python seizure_prediction.py --subject Dog_1 --model CNN_LSTM --trials 3
This is the main experiment where student models are trained for seizure prediction while being guided by the pre-trained teacher models.
-
For CHB-MIT (using patient-specific teachers):
python FGL_CHBMIT.py --target 1 --epochs 30 --trials 3 --optimizer Adam --alpha 0.5 --temperature 4
-
For AES (using universal teachers):
python FGL_AES.py --subject Dog_1 --epochs 25 --trials 3 --optimizer Adam --alpha 0.7 --temperature 4
For the Mackey-Glass dataset, a single script handles the entire pipeline: training the teacher (1-step forecast), the baseline (H-step forecast), and the FGL student (H-step forecast with guidance).
- Standard FGL Experiment:
This command runs the full pipeline for a forecast horizon of 5 steps.
python base_exp.py --horizon 5 --num_bins 50 --alpha 0.5 --epochs 20
- FGL with Page-Hinkley Drift Detection:
To evaluate FGL with online adaptation to data drift, add the
--use_phflag.python drift_exp.py --horizon 5 --num_bins 50 --alpha 0.5 --epochs 20 --use_ph
Future-Guided Learning leverages Knowledge Distillation, which is primarily controlled by two hyperparameters:
- Alpha (
α): A float between 0 and 1 that balances the standard cross-entropy loss and the KL divergence distillation loss. A higherαrelies more on the ground-truth labels. - Temperature (
T): A positive value that softens the probability distributions from the teacher and student models before calculating the KL divergence. Higher temperatures create softer distributions, encouraging the student to learn generalized patterns.
Optimal values for these parameters can be found via hyperparameter sweeps, as detailed in the paper's supplementary materials.
If you find this work useful in your research, please consider citing our paper:
@article{Gunasekaran2025,
author = {Gunasekaran, Skye and Kembay, Assel and Ladret, Hugo and Zhu, Rui-Jie and Perrinet, Laurent and Kavehei, Omid and Eshraghian, Jason},
title = {A predictive approach to enhance time-series forecasting},
journal = {Nature Communications},
year = {2025},
volume = {16},
number = {8645},
pages = {1--7},
doi = {10.1038/s41467-025-63786-4}
}
