This repository contains the complete implementation of a novel approach for non-invasive glucose monitoring using electrocardiogram (ECG) signals and advanced machine learning techniques. The research demonstrates the feasibility of predicting blood glucose levels from ECG-derived features, potentially revolutionizing diabetes management.
1st Md Basit Azam
Department of Computer Science & Engineering
Tezpur University
Napaam - 784 028, Tezpur, Assam, INDIA
π§ mdbasit@tezu.ernet.in
2nd Sarangthem Ibotombi Singh
Department of Computer Science & Engineering
Tezpur University
Napaam - 784 028, Tezpur, Assam, INDIA
π§ sis@tezu.ernet.in
- Clinical Significance: Non-invasive glucose monitoring to reduce finger-stick testing burden
- Advanced Signal Processing: Multi-domain ECG feature extraction including HRV, wavelet analysis, and frequency domain features
- Machine Learning Excellence: Ensemble methods with comprehensive hyperparameter optimization
- Clinical Validation: Evaluated using standard clinical metrics (MARD, Clarke Error Grid, Parkes Error Grid)
- Physiological Insights: Considers temporal lag between ECG changes and glucose variations
This implementation uses the D1NAMO dataset [1], a comprehensive collection of ECG and glucose measurements from both diabetic and healthy subjects. The dataset includes:
- Subjects: 26 (8 Type 1 Diabetic, 18 Healthy)
- Measurements: Synchronized ECG recordings and continuous glucose monitoring
- Duration: Multi-session recordings with temporal alignment
- Sampling Rate: 250 Hz ECG signals
- Glucose Range: 39.6-304.2 mg/dL with clinical categorization
dataset/
βββ diabetes_subset_ecg_data/
β βββ [subject_id]/
β βββ sensor_data/
β βββ [timestamp]_ECG.csv
βββ diabetes_subset_pictures-glucose-food-insulin/
β βββ [subject_id]/
β βββ glucose.csv
βββ healthy_subset_ecg_data/
β βββ [subject_id]/
β βββ sensor_data/
β βββ [timestamp]_ECG.csv
βββ healthy_subset_pictures-glucose-food/
βββ [subject_id]/
βββ glucose.csvInstallation
bash
git clone https://github.com/mdbasit897/ECG-Glucose-Prediction.git
cd ECG-Glucose-Prediction
bash
python -m venv ecg_glucose_env
source ecg_glucose_env/bin/activate # On Windows: ecg_glucose_env\Scripts\activate
bash
pip install -r requirements.txt
Phase 1: Data Preprocessing and Feature Extraction
bash
Phase1_ML_T1D_Healthy_5_min.py
What Phase 1 does:
- Loads and preprocesses ECG signals from D1NAMO dataset
- Applies advanced signal filtering (bandpass, baseline correction, noise reduction)
- Performs temporal alignment between ECG and glucose readings
- Extracts 50+ features including:
- Statistical features (mean, std, skewness, kurtosis)
- Frequency domain features (FFT, power spectral density)
- Wavelet decomposition features
- Heart Rate Variability (HRV) metrics
- Clinical timing features
- Generates data quality assessment reports
- Outputs:
preprocessed_data.pkl
Phase 2: Machine Learning Training and Evaluation
bash
Phase2_ML_T1D_Healthy_5_min.py
What Phase 2 does:
- Trains multiple ML models (Random Forest, XGBoost, LightGBM, Ensemble)
- Performs automated hyperparameter optimization
- Implements Leave-One-Subject-Out (LOSO) cross-validation
- Evaluates using clinical glucose prediction metrics
- Generates 17+ publication-quality visualizations
- Outputs: Trained models and comprehensive performance reports
ECG-Glucose-Prediction/
ββββExample Notebook
ββββplots
ββββsubject_001_analysis
β ββββOutput
β ββββplots
ββββsubject_002_analysis
β ββββOutput
β ββββplots
ββββsubject_004_analysis
β ββββOutput
β ββββplots
ββββsubject_005_analysis
β ββββOutput
β ββββplots
ββββsubject_006_analysis
β ββββOutput
β ββββplots
ββββsubject_007_analysis
β ββββOutput
β ββββplots
ββββsubject_008_analysis
β ββββOutput
β ββββplots
ββββsubject_009_analysis
| ββββOutput
| ββββplots
βββ CONTRIBUTING.md # Contribution guidelines
βββ LICENSE # MIT License
βββ Phase1_ML_T1D_Healthy_5_min.py # Data preprocessing and feature extraction
βββ Phase2_ML_T1D_Healthy_5_min.py # ML training and evaluation
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ model_output/ # Generated models and metadata (created during execution)
βββ plots/ # Generated visualizations (created during execution)
βββ preprocessed_data.pkl # Processed dataset (created during Phase 1)
1. ECG Preprocessing
- Bandpass filtering (0.5-40 Hz)
- Baseline wander removal
- Noise reduction using wavelet denoising
- R-peak detection and validation
2. Temporal Alignment
- Physiological lag consideration (5-15 minutes)
- Quality-based alignment scoring
- Missing data handling
3. Feature Engineering
- Multi-domain feature extraction
- Feature selection using RFECV
- Cross-subject normalization
- Base Models: Random Forest, XGBoost, LightGBM, HistGradientBoosting
- Ensemble: Stacking with XGBoost meta-learner (5-fold model evaluation, 3-fold feature selection)
Our implementation evaluates models using clinically relevant metrics:
- MARD (Mean Absolute Relative Difference): Clinical accuracy standard
- Clarke Error Grid: Clinical risk assessment (Zones A-E)
- Parkes Error Grid: Enhanced clinical significance analysis
- RMSE/MAE: Standard regression metrics
- RΒ²: Coefficient of determination
- Bland-Altman Analysis: Clinical agreement assessment
8 Visualizations
Main Analysis Figures:
- Dataset summary
- Feature correlation
- Cross-validation results
- Hyperparameter tuning
- Feature importance
- Actual vs predicted
- Clarke Error Grid
- Model comparison
- Sample entropy analysis
- Parkes Error Grid
- MARD by glycemic range
- Bland-Altman analysis
- Ensemble architecture
preprocessed_data.pkl: Complete processed datasetmodel_output/: Trained models with metadatadataset_statistics.json: Comprehensive data analyticsquality_assessment.json: Data quality metrics
This research addresses critical challenges in diabetes management:
- Continuous Monitoring: Enables non-invasive glucose tracking
- Patient Comfort: Eliminates painful finger-stick testing
- Cost Effectiveness: Reduces disposable sensor costs
- Integration Capability: Compatible with existing ECG monitoring systems
- Accessibility: Potential for widespread deployment
Computational Resources
- CPU: Multi-core processor (4+ cores recommended)
- RAM: 8GB minimum, 16GB recommended
- Storage: 0.5TB free space for datasets and outputs
- GPU: Optional, CUDA-compatible for faster training
Software Dependencies
- Python 3.8+
- Scientific computing: NumPy, SciPy, Pandas
- Machine learning: Scikit-learn, XGBoost, LightGBM
- Signal processing: PyWavelets, NeuroKit2
- Visualization: Matplotlib, Seaborn
- See requirements.txt for complete list
This implementation ensures reproducible research:
- Fixed random seeds: Consistent results across runs
- Version pinning: Exact dependency versions specified
- Cross-validation: Robust performance estimation
- Comprehensive logging: Detailed execution tracking
If you use this code in your research, please cite:
This project is licensed under the MIT License - see the LICENSE file for details.
The authors acknowledge support from the Google Cloud Research Credits program under Award GCP19980904 and partial compute resources from Googleβs TPU Research Cloud (TRC), both of which provided critical infrastructure for this research.
π Related Work
- NeuroKit2: Physiological signal processing
- Scikit-learn: Machine learning in Python
- D1NAMO Dataset: Original data source
[1] Fabien Dubosson, et al. The Open D1NAMO Dataset: A Multi-modal Dataset for Research on Non-invasive Type 1 Diabetes Management. 1.2.0, Zenodo, 19 Oct. 2018, doi:10.5281/zenodo.5651217.