Skip to content

Input Detection and BIDS Parsing #76

@jeremymanning

Description

@jeremymanning

Task 003: Input Detection and BIDS Parsing

Overview

Implement the high-level htfa.fit() function with automatic input detection, comprehensive BIDS dataset parsing using pybids, and robust input validation. This task creates the primary user interface that automatically determines whether input is a BIDS directory or NumPy arrays and handles the complete BIDS parsing pipeline.

Problem Statement

Users need a simple, intuitive interface that automatically handles different input types:

  • BIDS directory paths for neuroimaging datasets
  • Raw NumPy arrays with coordinate information
  • Flexible parameter specification for filtering and customization
  • Comprehensive validation with clear error messages

Technical Requirements

Core Input Detection

  • Automatic Type Detection: htfa.fit() method determines input type (BIDS vs arrays)
  • BIDS Directory Validation: Verify BIDS compliance using pybids validation
  • Array Input Handling: Support raw NumPy arrays with coordinate specifications
  • Parameter Inference: Automatic K estimation from data dimensions and structure

BIDS Integration Implementation

  • Dataset Parsing: Use pybids.BIDSLayout for comprehensive dataset parsing
  • Filtering Support: Subject, session, task, run filtering with BIDS query syntax
  • Metadata Extraction: Parse and preserve all BIDS metadata (TR, task info, etc.)
  • Derivative Detection: Identify and load preprocessed derivatives when available

Input Validation Framework

  • Path Validation: Comprehensive checks for BIDS directory structure
  • Array Validation: Shape, data type, and coordinate consistency checks
  • Parameter Validation: Valid ranges, types, and compatibility verification
  • Error Reporting: Clear, actionable error messages with suggested fixes

Implementation Details

Input Detection Logic

def fit(data, coords=None, **kwargs):
    """
    Primary HTFA interface with automatic input detection.
    
    Parameters
    ----------
    data : str or array-like or list of array-like
        BIDS directory path or fMRI data arrays
    coords : array-like, optional
        Voxel coordinates for array inputs
    **kwargs : dict
        Analysis parameters and BIDS filtering options
    """
    if isinstance(data, str) and os.path.isdir(data):
        return _fit_bids_dataset(data, **kwargs)
    elif isinstance(data, (np.ndarray, list)):
        return _fit_arrays(data, coords, **kwargs)
    else:
        raise ValueError(f"Invalid input type: {type(data)}")

BIDS Dataset Processing Pipeline

  • Layout Creation: BIDSLayout(dataset_path, validate=True)
  • Query Construction: Build BIDS queries from user parameters
  • File Discovery: Locate all relevant fMRI files and metadata
  • Metadata Aggregation: Collect TR, task parameters, subject info

Array Input Processing

  • Shape Validation: Ensure proper dimensionality (n_voxels, n_timepoints)
  • Coordinate Validation: Verify coordinate array matches data dimensions
  • Multi-subject Handling: Process lists of arrays with coordinate alignment
  • Parameter Consistency: Validate consistent shapes across subjects

Dependencies

External Dependencies

  • pybids: BIDS dataset parsing and validation
  • nibabel: NIfTI file loading and header parsing
  • numpy: Array operations and validation
  • pandas: Metadata handling and tabular operations

Internal Dependencies

  • Task 001: Core TFA implementation for single-subject processing
  • Task 002: HTFA implementation for multi-subject analysis
  • htfa.core: Access to TFA and HTFA classes for algorithm execution

Success Criteria

Functional Requirements

  • htfa.fit('/path/to/bids') successfully processes any valid BIDS dataset
  • htfa.fit(arrays, coords) handles raw NumPy array inputs correctly
  • Automatic K estimation provides reasonable defaults (8-15 factors typically)
  • BIDS filtering works: htfa.fit(path, subjects=['sub-01'], task='rest')
  • Comprehensive error messages for all invalid input scenarios

Performance Requirements

  • BIDS parsing completes in <30 seconds for datasets with <100 subjects
  • Input validation adds <5% overhead to total analysis time
  • Memory usage scales linearly with dataset size during parsing
  • Error detection is immediate without unnecessary computation

Code Quality Requirements

  • 100% type hints with mypy compliance
  • Comprehensive docstrings following Google style
  • >95% test coverage including edge cases and error conditions
  • Integration tests with synthetic BIDS datasets

Test Plan

Unit Tests

  • Input Detection: Test automatic type detection for all input scenarios
  • BIDS Validation: Verify proper handling of valid/invalid BIDS datasets
  • Array Processing: Test array input validation and preprocessing
  • Parameter Inference: Validate automatic K estimation algorithms

Integration Tests

  • Synthetic BIDS: Test with generated BIDS-compliant datasets
  • Error Handling: Verify appropriate error messages and graceful failures
  • Multi-format Support: Test mixed input scenarios and edge cases
  • Performance: Benchmark parsing speed with various dataset sizes

Edge Cases

  • Malformed BIDS: Test behavior with incomplete or invalid BIDS datasets
  • Missing Files: Handle missing fMRI files or metadata gracefully
  • Large Datasets: Verify memory efficiency with >50 subject datasets
  • Network Paths: Test with remote/mounted filesystem paths

Implementation Notes

BIDS Query Strategy

# Example BIDS filtering implementation
layout = BIDSLayout(bids_path, validate=validate_bids)
queries = {
    'subject': subjects or layout.get_subjects(),
    'task': task or layout.get_tasks(),
    'session': session or layout.get_sessions(),
    'suffix': 'bold',
    'extension': ['.nii', '.nii.gz']
}
files = layout.get(**queries)

Parameter Inference Logic

  • Factor Estimation: K = max(8, min(15, n_timepoints // 10))
  • Preprocessing Defaults: Based on detected TR and analysis type
  • Validation Thresholds: Minimum timepoints, voxel counts, subject numbers

Error Message Standards

  • Descriptive: Clearly explain what went wrong and why
  • Actionable: Provide specific steps to fix the problem
  • Contextual: Include relevant file paths, parameters, and data info
  • Consistent: Use standard formatting and terminology throughout

Deliverables

  • htfa/fit.py: Main interface function with input detection
  • htfa/bids.py: BIDS dataset parsing and validation utilities
  • htfa/validation.py: Input validation framework
  • tests/test_input_detection.py: Comprehensive test suite
  • tests/test_bids_integration.py: BIDS-specific integration tests
  • Documentation updates with usage examples and API reference

Acceptance Criteria

Must-Have Features

  • Single function htfa.fit() handles both BIDS and array inputs automatically
  • Complete BIDS parsing with subject/task/session filtering capabilities
  • Robust error handling with clear, actionable error messages
  • Automatic parameter inference with sensible defaults
  • Full integration with existing TFA/HTFA core algorithms

Quality Gates

  • All unit and integration tests pass with >95% coverage
  • BIDS compliance verified with pybids validation
  • Performance benchmarks meet requirements (parsing <30s for typical datasets)
  • Documentation includes complete API reference and usage examples
  • Mypy type checking passes without errors or ignores

Definition of Ready for Next Task

  • Input detection and parsing infrastructure complete
  • BIDS integration fully functional and tested
  • Error handling comprehensive and user-friendly
  • Ready for HTFAPreprocessor implementation (Task 004)
  • All validation framework components in place

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions