Skip to content

HTFAPreprocessor Implementation #69

@jeremymanning

Description

@jeremymanning

Task 004: HTFAPreprocessor Implementation

Overview

Implement the HTFAPreprocessor class providing a comprehensive, configurable preprocessing pipeline for neuroimaging data. This includes brain masking, spatial smoothing, temporal detrending, standardization, and quality control steps optimized for HTFA analysis using nilearn's preprocessing capabilities.

Problem Statement

HTFA requires properly preprocessed fMRI data to produce meaningful results:

  • Brain masking to remove non-brain voxels
  • Spatial smoothing to improve signal-to-noise ratio
  • Temporal preprocessing (detrending, standardization)
  • Quality control and outlier detection
  • Consistent preprocessing across subjects and sessions

Technical Requirements

Core Preprocessing Pipeline

  • Brain Masking: Automatic brain extraction or custom mask application
  • Spatial Smoothing: Configurable FWHM with nilearn smoothing functions
  • Temporal Detrending: Linear/polynomial detrending with high-pass filtering
  • Standardization: Z-score normalization within runs and across voxels
  • Quality Control: Motion parameter extraction and outlier detection

Configuration and Flexibility

  • Sensible Defaults: Automatic parameter selection based on data characteristics
  • Full Customization: Override any preprocessing step with custom parameters
  • Pipeline Validation: Ensure preprocessing parameters are compatible
  • Memory Efficiency: Process data in chunks to handle large datasets

BIDS Compliance and Integration

  • Metadata Preservation: Maintain BIDS metadata throughout processing
  • Confound Integration: Handle BIDS-standard confound regressors
  • Derivative Generation: Create BIDS derivatives-compliant preprocessed data
  • Provenance Tracking: Record all preprocessing steps and parameters

Implementation Details

HTFAPreprocessor Class Design

class HTFAPreprocessor:
    """
    Comprehensive preprocessing pipeline for HTFA analysis.
    
    Parameters
    ----------
    mask_strategy : {'auto', 'custom', None}
        Brain masking approach
    smoothing_fwhm : float or None
        Spatial smoothing kernel size in mm
    detrend : bool or int
        Temporal detrending (True=linear, int=polynomial order)
    standardize : bool
        Apply temporal standardization
    high_pass : float or None
        High-pass filter cutoff in Hz
    """

Brain Masking Implementation

  • Automatic Masking: Use nilearn's compute_brain_mask for EPI data
  • Custom Mask Support: Accept user-provided brain masks
  • Mask Validation: Ensure mask dimensions match functional data
  • Multi-subject Consistency: Option to use common mask across subjects

Spatial Preprocessing Pipeline

  • Smoothing Strategy: Gaussian kernel smoothing with configurable FWHM
  • Resolution Preservation: Maintain original voxel resolution after smoothing
  • Edge Handling: Proper boundary conditions for smoothing operations
  • Memory Management: Process volumes sequentially for large datasets

Temporal Preprocessing Components

  • Detrending Options: Linear, polynomial, or custom detrending functions
  • High-pass Filtering: Butterworth or FIR filters for low-frequency removal
  • Standardization Methods: Z-score, robust scaling, or custom normalization
  • Confound Regression: Integration with BIDS confound files

Dependencies

External Dependencies

  • nilearn: Core neuroimaging preprocessing functions
  • nibabel: NIfTI file I/O and manipulation
  • scipy: Signal processing for filtering and detrending
  • sklearn: Preprocessing utilities and validation functions

Internal Dependencies

  • Task 003: Input detection and BIDS parsing for data loading
  • htfa.bids: BIDS integration utilities and metadata handling
  • htfa.validation: Input validation framework

Success Criteria

Functional Requirements

  • Complete preprocessing pipeline with all standard neuroimaging steps
  • Configurable parameters with sensible defaults for HTFA analysis
  • Integration with BIDS datasets and metadata preservation
  • Support for both single-subject and multi-subject preprocessing
  • Comprehensive quality control and outlier detection

Performance Requirements

  • Preprocessing time <25% of total analysis time for typical datasets
  • Memory usage scales linearly with data size (no memory leaks)
  • Support for datasets with >100 subjects without performance degradation
  • Chunked processing for datasets exceeding available RAM

Code Quality Requirements

  • Full type hints and mypy compliance
  • Comprehensive docstrings with parameter descriptions and examples
  • >90% test coverage including edge cases and error scenarios
  • Integration tests with real and synthetic neuroimaging data

Test Plan

Unit Tests

  • Masking Functions: Test automatic and custom brain masking
  • Smoothing Operations: Validate spatial smoothing with different kernels
  • Temporal Processing: Test detrending, filtering, and standardization
  • Parameter Validation: Ensure invalid parameters raise appropriate errors

Integration Tests

  • BIDS Pipeline: Test complete preprocessing of BIDS datasets
  • Memory Management: Verify efficient processing of large datasets
  • Quality Control: Test outlier detection and quality metrics
  • Multi-subject Consistency: Validate consistent preprocessing across subjects

Performance Tests

  • Memory Profiling: Monitor memory usage during preprocessing
  • Speed Benchmarks: Measure preprocessing time for various dataset sizes
  • Scalability Testing: Verify performance with increasing subject counts
  • Resource Utilization: Monitor CPU and memory usage patterns

Implementation Notes

Default Parameter Selection

# Sensible defaults for HTFA preprocessing
defaults = {
    'smoothing_fwhm': 6.0,  # 6mm FWHM for good spatial regularization
    'detrend': True,        # Linear detrending
    'standardize': True,    # Z-score standardization
    'high_pass': 1/128,     # 128s high-pass filter cutoff
    'mask_strategy': 'auto' # Automatic brain masking
}

Memory Management Strategy

  • Lazy Loading: Load data only when needed for processing
  • Chunked Processing: Process timepoints in chunks for large datasets
  • Memory Monitoring: Track memory usage and warn about resource constraints
  • Cleanup: Explicit memory cleanup after processing steps

Quality Control Metrics

  • Motion Assessment: Extract frame displacement metrics
  • Signal Quality: Compute temporal SNR and variance metrics
  • Outlier Detection: Identify volumes with excessive motion or artifacts
  • Coverage Assessment: Evaluate brain mask coverage and quality

BIDS Derivatives Compliance

Output Structure

derivatives/htfa/
  sub-{subject}/
    ses-{session}/
      func/
        sub-{subject}_ses-{session}_task-{task}_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz
        sub-{subject}_ses-{session}_task-{task}_desc-confounds_timeseries.tsv
        sub-{subject}_ses-{session}_task-{task}_desc-preprocessing_params.json

Metadata Generation

  • Processing Parameters: JSON sidecar with all preprocessing parameters
  • Software Provenance: Record htfa version, nilearn version, processing date
  • Quality Metrics: Include motion parameters, SNR, and outlier information
  • Transformation Records: Document any spatial transformations applied

Deliverables

  • htfa/preprocessing.py: HTFAPreprocessor class implementation
  • htfa/quality_control.py: Quality assessment and outlier detection utilities
  • htfa/derivatives.py: BIDS derivatives output formatting
  • tests/test_preprocessing.py: Comprehensive preprocessing test suite
  • tests/test_quality_control.py: Quality control validation tests
  • Documentation with preprocessing parameter guide and best practices

Acceptance Criteria

Must-Have Features

  • Complete HTFAPreprocessor class with all neuroimaging preprocessing steps
  • Configurable pipeline with sensible defaults optimized for HTFA
  • BIDS derivatives-compliant output with proper metadata
  • Quality control metrics and outlier detection capabilities
  • Memory-efficient processing suitable for large datasets

Quality Gates

  • All preprocessing steps validated against established neuroimaging standards
  • Performance benchmarks met (preprocessing <25% of total analysis time)
  • Memory usage linear scaling without leaks or excessive consumption
  • BIDS compliance verified with standard validation tools
  • Comprehensive test coverage including edge cases and error conditions

Definition of Ready for Next Task

  • Preprocessing pipeline complete and fully tested
  • BIDS derivatives output properly formatted and validated
  • Quality control framework operational and documented
  • Ready for HTFAResults implementation and visualization components
  • Integration with core algorithms (TFA/HTFA) verified and working

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions