-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Task 003: Input Detection and BIDS Parsing
Overview
Implement the high-level htfa.fit() function with automatic input detection, comprehensive BIDS dataset parsing using pybids, and robust input validation. This task creates the primary user interface that automatically determines whether input is a BIDS directory or NumPy arrays and handles the complete BIDS parsing pipeline.
Problem Statement
Users need a simple, intuitive interface that automatically handles different input types:
- BIDS directory paths for neuroimaging datasets
- Raw NumPy arrays with coordinate information
- Flexible parameter specification for filtering and customization
- Comprehensive validation with clear error messages
Technical Requirements
Core Input Detection
- Automatic Type Detection:
htfa.fit()method determines input type (BIDS vs arrays) - BIDS Directory Validation: Verify BIDS compliance using pybids validation
- Array Input Handling: Support raw NumPy arrays with coordinate specifications
- Parameter Inference: Automatic K estimation from data dimensions and structure
BIDS Integration Implementation
- Dataset Parsing: Use pybids.BIDSLayout for comprehensive dataset parsing
- Filtering Support: Subject, session, task, run filtering with BIDS query syntax
- Metadata Extraction: Parse and preserve all BIDS metadata (TR, task info, etc.)
- Derivative Detection: Identify and load preprocessed derivatives when available
Input Validation Framework
- Path Validation: Comprehensive checks for BIDS directory structure
- Array Validation: Shape, data type, and coordinate consistency checks
- Parameter Validation: Valid ranges, types, and compatibility verification
- Error Reporting: Clear, actionable error messages with suggested fixes
Implementation Details
Input Detection Logic
def fit(data, coords=None, **kwargs):
"""
Primary HTFA interface with automatic input detection.
Parameters
----------
data : str or array-like or list of array-like
BIDS directory path or fMRI data arrays
coords : array-like, optional
Voxel coordinates for array inputs
**kwargs : dict
Analysis parameters and BIDS filtering options
"""
if isinstance(data, str) and os.path.isdir(data):
return _fit_bids_dataset(data, **kwargs)
elif isinstance(data, (np.ndarray, list)):
return _fit_arrays(data, coords, **kwargs)
else:
raise ValueError(f"Invalid input type: {type(data)}")BIDS Dataset Processing Pipeline
- Layout Creation:
BIDSLayout(dataset_path, validate=True) - Query Construction: Build BIDS queries from user parameters
- File Discovery: Locate all relevant fMRI files and metadata
- Metadata Aggregation: Collect TR, task parameters, subject info
Array Input Processing
- Shape Validation: Ensure proper dimensionality (n_voxels, n_timepoints)
- Coordinate Validation: Verify coordinate array matches data dimensions
- Multi-subject Handling: Process lists of arrays with coordinate alignment
- Parameter Consistency: Validate consistent shapes across subjects
Dependencies
External Dependencies
- pybids: BIDS dataset parsing and validation
- nibabel: NIfTI file loading and header parsing
- numpy: Array operations and validation
- pandas: Metadata handling and tabular operations
Internal Dependencies
- Task 001: Core TFA implementation for single-subject processing
- Task 002: HTFA implementation for multi-subject analysis
- htfa.core: Access to TFA and HTFA classes for algorithm execution
Success Criteria
Functional Requirements
-
htfa.fit('/path/to/bids')successfully processes any valid BIDS dataset -
htfa.fit(arrays, coords)handles raw NumPy array inputs correctly - Automatic K estimation provides reasonable defaults (8-15 factors typically)
- BIDS filtering works:
htfa.fit(path, subjects=['sub-01'], task='rest') - Comprehensive error messages for all invalid input scenarios
Performance Requirements
- BIDS parsing completes in <30 seconds for datasets with <100 subjects
- Input validation adds <5% overhead to total analysis time
- Memory usage scales linearly with dataset size during parsing
- Error detection is immediate without unnecessary computation
Code Quality Requirements
- 100% type hints with mypy compliance
- Comprehensive docstrings following Google style
- >95% test coverage including edge cases and error conditions
- Integration tests with synthetic BIDS datasets
Test Plan
Unit Tests
- Input Detection: Test automatic type detection for all input scenarios
- BIDS Validation: Verify proper handling of valid/invalid BIDS datasets
- Array Processing: Test array input validation and preprocessing
- Parameter Inference: Validate automatic K estimation algorithms
Integration Tests
- Synthetic BIDS: Test with generated BIDS-compliant datasets
- Error Handling: Verify appropriate error messages and graceful failures
- Multi-format Support: Test mixed input scenarios and edge cases
- Performance: Benchmark parsing speed with various dataset sizes
Edge Cases
- Malformed BIDS: Test behavior with incomplete or invalid BIDS datasets
- Missing Files: Handle missing fMRI files or metadata gracefully
- Large Datasets: Verify memory efficiency with >50 subject datasets
- Network Paths: Test with remote/mounted filesystem paths
Implementation Notes
BIDS Query Strategy
# Example BIDS filtering implementation
layout = BIDSLayout(bids_path, validate=validate_bids)
queries = {
'subject': subjects or layout.get_subjects(),
'task': task or layout.get_tasks(),
'session': session or layout.get_sessions(),
'suffix': 'bold',
'extension': ['.nii', '.nii.gz']
}
files = layout.get(**queries)Parameter Inference Logic
- Factor Estimation: K = max(8, min(15, n_timepoints // 10))
- Preprocessing Defaults: Based on detected TR and analysis type
- Validation Thresholds: Minimum timepoints, voxel counts, subject numbers
Error Message Standards
- Descriptive: Clearly explain what went wrong and why
- Actionable: Provide specific steps to fix the problem
- Contextual: Include relevant file paths, parameters, and data info
- Consistent: Use standard formatting and terminology throughout
Deliverables
-
htfa/fit.py: Main interface function with input detection -
htfa/bids.py: BIDS dataset parsing and validation utilities -
htfa/validation.py: Input validation framework -
tests/test_input_detection.py: Comprehensive test suite -
tests/test_bids_integration.py: BIDS-specific integration tests - Documentation updates with usage examples and API reference
Acceptance Criteria
Must-Have Features
- Single function
htfa.fit()handles both BIDS and array inputs automatically - Complete BIDS parsing with subject/task/session filtering capabilities
- Robust error handling with clear, actionable error messages
- Automatic parameter inference with sensible defaults
- Full integration with existing TFA/HTFA core algorithms
Quality Gates
- All unit and integration tests pass with >95% coverage
- BIDS compliance verified with pybids validation
- Performance benchmarks meet requirements (parsing <30s for typical datasets)
- Documentation includes complete API reference and usage examples
- Mypy type checking passes without errors or ignores
Definition of Ready for Next Task
- Input detection and parsing infrastructure complete
- BIDS integration fully functional and tested
- Error handling comprehensive and user-friendly
- Ready for HTFAPreprocessor implementation (Task 004)
- All validation framework components in place
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels