Skip to content

Non-invasive glucose monitoring using ECG signal analysis and machine learning for diabetes management research

License

Notifications You must be signed in to change notification settings

mdbasit897/ECG-Glucose-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Non-Invasive Glucose Prediction Using Machine Learning on ECG Signals

This repository contains the complete implementation of a novel approach for non-invasive glucose monitoring using electrocardiogram (ECG) signals and advanced machine learning techniques. The research demonstrates the feasibility of predicting blood glucose levels from ECG-derived features, potentially revolutionizing diabetes management.

1st Md Basit Azam
Department of Computer Science & Engineering
Tezpur University
Napaam - 784 028, Tezpur, Assam, INDIA
πŸ“§ mdbasit@tezu.ernet.in

2nd Sarangthem Ibotombi Singh
Department of Computer Science & Engineering
Tezpur University
Napaam - 784 028, Tezpur, Assam, INDIA
πŸ“§ sis@tezu.ernet.in

Python 3.8+ License: MIT DOI

Research Highlights

  • Clinical Significance: Non-invasive glucose monitoring to reduce finger-stick testing burden
  • Advanced Signal Processing: Multi-domain ECG feature extraction including HRV, wavelet analysis, and frequency domain features
  • Machine Learning Excellence: Ensemble methods with comprehensive hyperparameter optimization
  • Clinical Validation: Evaluated using standard clinical metrics (MARD, Clarke Error Grid, Parkes Error Grid)
  • Physiological Insights: Considers temporal lag between ECG changes and glucose variations

Dataset

This implementation uses the D1NAMO dataset [1], a comprehensive collection of ECG and glucose measurements from both diabetic and healthy subjects. The dataset includes:

  • Subjects: 26 (8 Type 1 Diabetic, 18 Healthy)
  • Measurements: Synchronized ECG recordings and continuous glucose monitoring
  • Duration: Multi-session recordings with temporal alignment
  • Sampling Rate: 250 Hz ECG signals
  • Glucose Range: 39.6-304.2 mg/dL with clinical categorization

Data Structure Expected

dataset/
β”œβ”€β”€ diabetes_subset_ecg_data/
β”‚   └── [subject_id]/
β”‚       └── sensor_data/
β”‚           └── [timestamp]_ECG.csv
β”œβ”€β”€ diabetes_subset_pictures-glucose-food-insulin/
β”‚   └── [subject_id]/
β”‚       └── glucose.csv
β”œβ”€β”€ healthy_subset_ecg_data/
β”‚   └── [subject_id]/
β”‚       └── sensor_data/
β”‚           └── [timestamp]_ECG.csv
└── healthy_subset_pictures-glucose-food/
    └── [subject_id]/
        └── glucose.csv

Installation

1. Clone the repository

bash
git clone https://github.com/mdbasit897/ECG-Glucose-Prediction.git
cd ECG-Glucose-Prediction

2. Create virtual environment

bash
python -m venv ecg_glucose_env
source ecg_glucose_env/bin/activate  # On Windows: ecg_glucose_env\Scripts\activate

3. Install dependencies

bash
pip install -r requirements.txt

Usage

Phase 1: Data Preprocessing and Feature Extraction

bash
Phase1_ML_T1D_Healthy_5_min.py

What Phase 1 does:

  • Loads and preprocesses ECG signals from D1NAMO dataset
  • Applies advanced signal filtering (bandpass, baseline correction, noise reduction)
  • Performs temporal alignment between ECG and glucose readings
  • Extracts 50+ features including:
    • Statistical features (mean, std, skewness, kurtosis)
    • Frequency domain features (FFT, power spectral density)
    • Wavelet decomposition features
    • Heart Rate Variability (HRV) metrics
    • Clinical timing features
  • Generates data quality assessment reports
  • Outputs: preprocessed_data.pkl

Phase 2: Machine Learning Training and Evaluation

bash
Phase2_ML_T1D_Healthy_5_min.py

What Phase 2 does:

  • Trains multiple ML models (Random Forest, XGBoost, LightGBM, Ensemble)
  • Performs automated hyperparameter optimization
  • Implements Leave-One-Subject-Out (LOSO) cross-validation
  • Evaluates using clinical glucose prediction metrics
  • Generates 17+ publication-quality visualizations
  • Outputs: Trained models and comprehensive performance reports

πŸ“ File Structure

ECG-Glucose-Prediction/
β”œβ”€β”€β”€Example Notebook
β”œβ”€β”€β”€plots
β”œβ”€β”€β”€subject_001_analysis
β”‚   └───Output
β”‚       └───plots
β”œβ”€β”€β”€subject_002_analysis
β”‚   └───Output
β”‚       └───plots
β”œβ”€β”€β”€subject_004_analysis
β”‚   └───Output
β”‚       └───plots
β”œβ”€β”€β”€subject_005_analysis
β”‚   └───Output
β”‚       └───plots
β”œβ”€β”€β”€subject_006_analysis
β”‚   └───Output
β”‚       └───plots
β”œβ”€β”€β”€subject_007_analysis
β”‚   └───Output
β”‚       └───plots
β”œβ”€β”€β”€subject_008_analysis
β”‚   └───Output
β”‚       └───plots
β”œβ”€β”€β”€subject_009_analysis
|    └───Output
|      └───plots
β”œβ”€β”€ CONTRIBUTING.md                        # Contribution guidelines
β”œβ”€β”€ LICENSE                                # MIT License
β”œβ”€β”€ Phase1_ML_T1D_Healthy_5_min.py         # Data preprocessing and feature extraction
β”œβ”€β”€ Phase2_ML_T1D_Healthy_5_min.py         # ML training and evaluation
β”œβ”€β”€ requirements.txt                       # Python dependencies
β”œβ”€β”€ README.md                              # This file
β”œβ”€β”€ model_output/                          # Generated models and metadata (created during execution)
β”œβ”€β”€ plots/                                 # Generated visualizations (created during execution)
└── preprocessed_data.pkl                  # Processed dataset (created during Phase 1)

Methodology

Signal Processing Pipeline

1. ECG Preprocessing

  • Bandpass filtering (0.5-40 Hz)
  • Baseline wander removal
  • Noise reduction using wavelet denoising
  • R-peak detection and validation

2. Temporal Alignment

  • Physiological lag consideration (5-15 minutes)
  • Quality-based alignment scoring
  • Missing data handling

3. Feature Engineering

  • Multi-domain feature extraction
  • Feature selection using RFECV
  • Cross-subject normalization

Machine Learning Architecture

  • Base Models: Random Forest, XGBoost, LightGBM, HistGradientBoosting
  • Ensemble: Stacking with XGBoost meta-learner (5-fold model evaluation, 3-fold feature selection)

Performance Metrics

Our implementation evaluates models using clinically relevant metrics:

  • MARD (Mean Absolute Relative Difference): Clinical accuracy standard
  • Clarke Error Grid: Clinical risk assessment (Zones A-E)
  • Parkes Error Grid: Enhanced clinical significance analysis
  • RMSE/MAE: Standard regression metrics
  • RΒ²: Coefficient of determination
  • Bland-Altman Analysis: Clinical agreement assessment

Generated Outputs

8 Visualizations

Main Analysis Figures:

  • Dataset summary
  • Feature correlation
  • Cross-validation results
  • Hyperparameter tuning
  • Feature importance
  • Actual vs predicted
  • Clarke Error Grid
  • Model comparison
  • Sample entropy analysis
  • Parkes Error Grid
  • MARD by glycemic range
  • Bland-Altman analysis
  • Ensemble architecture

Data Files

  • preprocessed_data.pkl: Complete processed dataset
  • model_output/: Trained models with metadata
  • dataset_statistics.json: Comprehensive data analytics
  • quality_assessment.json: Data quality metrics

Clinical Significance

This research addresses critical challenges in diabetes management:

  • Continuous Monitoring: Enables non-invasive glucose tracking
  • Patient Comfort: Eliminates painful finger-stick testing
  • Cost Effectiveness: Reduces disposable sensor costs
  • Integration Capability: Compatible with existing ECG monitoring systems
  • Accessibility: Potential for widespread deployment

Technical Requirements

Computational Resources

  • CPU: Multi-core processor (4+ cores recommended)
  • RAM: 8GB minimum, 16GB recommended
  • Storage: 0.5TB free space for datasets and outputs
  • GPU: Optional, CUDA-compatible for faster training

Software Dependencies

  • Python 3.8+
  • Scientific computing: NumPy, SciPy, Pandas
  • Machine learning: Scikit-learn, XGBoost, LightGBM
  • Signal processing: PyWavelets, NeuroKit2
  • Visualization: Matplotlib, Seaborn
  • See requirements.txt for complete list

Reproducibility

This implementation ensures reproducible research:

  • Fixed random seeds: Consistent results across runs
  • Version pinning: Exact dependency versions specified
  • Cross-validation: Robust performance estimation
  • Comprehensive logging: Detailed execution tracking

Citation

If you use this code in your research, please cite:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

The authors acknowledge support from the Google Cloud Research Credits program under Award GCP19980904 and partial compute resources from Google’s TPU Research Cloud (TRC), both of which provided critical infrastructure for this research.

πŸ”— Related Work

References

[1] Fabien Dubosson, et al. The Open D1NAMO Dataset: A Multi-modal Dataset for Research on Non-invasive Type 1 Diabetes Management. 1.2.0, Zenodo, 19 Oct. 2018, doi:10.5281/zenodo.5651217.

About

Non-invasive glucose monitoring using ECG signal analysis and machine learning for diabetes management research

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published