-
Notifications
You must be signed in to change notification settings - Fork 0
[Phase 3] Feature Engineering #4
Copy link
Copy link
Open
Description
Phase 3: Feature Engineering
Objectives
Extract clinically meaningful features from ECG beats for classical ML models.
Tasks
- Implement time-domain feature extraction (10+ features)
- Implement frequency-domain feature extraction (8+ features)
- Implement wavelet-based feature extraction (8+ features)
- Create unified feature extraction pipeline
- Document clinical meaning of each feature
- Validate features against published literature
- Handle edge cases (NaN, division by zero)
Files to Create/Modify
| File | Action | Description |
|---|---|---|
src/feature_extraction.py |
Create | Feature extraction module |
tests/test_features.py |
Create | Unit tests |
Features to Extract
Time Domain (10 features):
- Mean, std, variance
- Skewness, kurtosis
- RMS (root mean square)
- Peak amplitude, peak-to-peak
- QRS duration estimate
- RR interval ratio
Frequency Domain (8 features):
- Spectral centroid, spectral spread
- Spectral entropy
- Band powers: VLF, LF, HF
- LF/HF ratio
- Dominant frequency
Wavelet Features (8+ features):
- Energy at scales 4, 8, 16, 32 (db4 wavelet)
- Approximation coefficient statistics
- Detail coefficient statistics
Code Reference
from scipy.stats import skew, kurtosis
from scipy.signal import welch
import pywt
import numpy as np
class FeatureExtractor:
def __init__(self, fs: int = 360):
self.fs = fs
def time_domain_features(self, beat: np.ndarray) -> dict:
return {
'mean': np.mean(beat),
'std': np.std(beat),
'variance': np.var(beat),
'rms': np.sqrt(np.mean(beat**2)),
'peak': np.max(np.abs(beat)),
'peak_to_peak': np.ptp(beat),
'skewness': skew(beat),
'kurtosis': kurtosis(beat),
}
def frequency_domain_features(self, beat: np.ndarray) -> dict:
freqs, psd = welch(beat, fs=self.fs, nperseg=min(256, len(beat)))
total_power = np.sum(psd)
spectral_centroid = np.sum(freqs * psd) / (total_power + 1e-10)
return {
'spectral_centroid': spectral_centroid,
'total_power': total_power,
# ... more features
}
def wavelet_features(self, beat: np.ndarray, wavelet: str = 'db4') -> dict:
coeffs = pywt.wavedec(beat, wavelet, level=4)
features = {}
for i, c in enumerate(coeffs):
features[f'wavelet_energy_{i}'] = np.sum(c**2)
features[f'wavelet_std_{i}'] = np.std(c)
return features
def extract_all(self, beat: np.ndarray) -> np.ndarray:
"""Extract all features and return as array."""
all_features = {}
all_features.update(self.time_domain_features(beat))
all_features.update(self.frequency_domain_features(beat))
all_features.update(self.wavelet_features(beat))
return np.array(list(all_features.values()))Definition of Done
- 30-40 features extracted per beat
- All features have valid ranges (no NaN, inf)
- Feature names documented with clinical interpretation
- Unit tests verify calculations against known values
- Feature extraction runs <10ms per beat
Technical Notes
For junior developers:
- Kurtosis is high for impulsive signals (like arrhythmias)
- LF/HF ratio relates to autonomic nervous system
- Wavelet decomposition captures multi-scale information
- Always add small epsilon (1e-10) to denominators
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels