Input Detection and BIDS Parsing


# Task 003: Input Detection and BIDS Parsing

## Overview

Implement the high-level `htfa.fit()` function with automatic input detection, comprehensive BIDS dataset parsing using pybids, and robust input validation. This task creates the primary user interface that automatically determines whether input is a BIDS directory or NumPy arrays and handles the complete BIDS parsing pipeline.

## Problem Statement

Users need a simple, intuitive interface that automatically handles different input types:
- BIDS directory paths for neuroimaging datasets
- Raw NumPy arrays with coordinate information
- Flexible parameter specification for filtering and customization
- Comprehensive validation with clear error messages

## Technical Requirements

### Core Input Detection
- **Automatic Type Detection**: `htfa.fit()` method determines input type (BIDS vs arrays)
- **BIDS Directory Validation**: Verify BIDS compliance using pybids validation
- **Array Input Handling**: Support raw NumPy arrays with coordinate specifications
- **Parameter Inference**: Automatic K estimation from data dimensions and structure

### BIDS Integration Implementation
- **Dataset Parsing**: Use pybids.BIDSLayout for comprehensive dataset parsing
- **Filtering Support**: Subject, session, task, run filtering with BIDS query syntax
- **Metadata Extraction**: Parse and preserve all BIDS metadata (TR, task info, etc.)
- **Derivative Detection**: Identify and load preprocessed derivatives when available

### Input Validation Framework
- **Path Validation**: Comprehensive checks for BIDS directory structure
- **Array Validation**: Shape, data type, and coordinate consistency checks
- **Parameter Validation**: Valid ranges, types, and compatibility verification
- **Error Reporting**: Clear, actionable error messages with suggested fixes

## Implementation Details

### Input Detection Logic
```python
def fit(data, coords=None, **kwargs):
    """
    Primary HTFA interface with automatic input detection.
    
    Parameters
    ----------
    data : str or array-like or list of array-like
        BIDS directory path or fMRI data arrays
    coords : array-like, optional
        Voxel coordinates for array inputs
    **kwargs : dict
        Analysis parameters and BIDS filtering options
    """
    if isinstance(data, str) and os.path.isdir(data):
        return _fit_bids_dataset(data, **kwargs)
    elif isinstance(data, (np.ndarray, list)):
        return _fit_arrays(data, coords, **kwargs)
    else:
        raise ValueError(f"Invalid input type: {type(data)}")
```

### BIDS Dataset Processing Pipeline
- **Layout Creation**: `BIDSLayout(dataset_path, validate=True)`
- **Query Construction**: Build BIDS queries from user parameters
- **File Discovery**: Locate all relevant fMRI files and metadata
- **Metadata Aggregation**: Collect TR, task parameters, subject info

### Array Input Processing
- **Shape Validation**: Ensure proper dimensionality (n_voxels, n_timepoints)
- **Coordinate Validation**: Verify coordinate array matches data dimensions
- **Multi-subject Handling**: Process lists of arrays with coordinate alignment
- **Parameter Consistency**: Validate consistent shapes across subjects

## Dependencies

### External Dependencies
- **pybids**: BIDS dataset parsing and validation
- **nibabel**: NIfTI file loading and header parsing
- **numpy**: Array operations and validation
- **pandas**: Metadata handling and tabular operations

### Internal Dependencies
- **Task 001**: Core TFA implementation for single-subject processing
- **Task 002**: HTFA implementation for multi-subject analysis
- **htfa.core**: Access to TFA and HTFA classes for algorithm execution

## Success Criteria

### Functional Requirements
- [ ] `htfa.fit('/path/to/bids')` successfully processes any valid BIDS dataset
- [ ] `htfa.fit(arrays, coords)` handles raw NumPy array inputs correctly
- [ ] Automatic K estimation provides reasonable defaults (8-15 factors typically)
- [ ] BIDS filtering works: `htfa.fit(path, subjects=['sub-01'], task='rest')`
- [ ] Comprehensive error messages for all invalid input scenarios

### Performance Requirements
- [ ] BIDS parsing completes in <30 seconds for datasets with <100 subjects
- [ ] Input validation adds <5% overhead to total analysis time
- [ ] Memory usage scales linearly with dataset size during parsing
- [ ] Error detection is immediate without unnecessary computation

### Code Quality Requirements
- [ ] 100% type hints with mypy compliance
- [ ] Comprehensive docstrings following Google style
- [ ] >95% test coverage including edge cases and error conditions
- [ ] Integration tests with synthetic BIDS datasets

## Test Plan

### Unit Tests
- **Input Detection**: Test automatic type detection for all input scenarios
- **BIDS Validation**: Verify proper handling of valid/invalid BIDS datasets
- **Array Processing**: Test array input validation and preprocessing
- **Parameter Inference**: Validate automatic K estimation algorithms

### Integration Tests
- **Synthetic BIDS**: Test with generated BIDS-compliant datasets
- **Error Handling**: Verify appropriate error messages and graceful failures
- **Multi-format Support**: Test mixed input scenarios and edge cases
- **Performance**: Benchmark parsing speed with various dataset sizes

### Edge Cases
- **Malformed BIDS**: Test behavior with incomplete or invalid BIDS datasets
- **Missing Files**: Handle missing fMRI files or metadata gracefully
- **Large Datasets**: Verify memory efficiency with >50 subject datasets
- **Network Paths**: Test with remote/mounted filesystem paths

## Implementation Notes

### BIDS Query Strategy
```python
# Example BIDS filtering implementation
layout = BIDSLayout(bids_path, validate=validate_bids)
queries = {
    'subject': subjects or layout.get_subjects(),
    'task': task or layout.get_tasks(),
    'session': session or layout.get_sessions(),
    'suffix': 'bold',
    'extension': ['.nii', '.nii.gz']
}
files = layout.get(**queries)
```

### Parameter Inference Logic
- **Factor Estimation**: K = max(8, min(15, n_timepoints // 10))
- **Preprocessing Defaults**: Based on detected TR and analysis type
- **Validation Thresholds**: Minimum timepoints, voxel counts, subject numbers

### Error Message Standards
- **Descriptive**: Clearly explain what went wrong and why
- **Actionable**: Provide specific steps to fix the problem
- **Contextual**: Include relevant file paths, parameters, and data info
- **Consistent**: Use standard formatting and terminology throughout

## Deliverables

- [ ] `htfa/fit.py`: Main interface function with input detection
- [ ] `htfa/bids.py`: BIDS dataset parsing and validation utilities  
- [ ] `htfa/validation.py`: Input validation framework
- [ ] `tests/test_input_detection.py`: Comprehensive test suite
- [ ] `tests/test_bids_integration.py`: BIDS-specific integration tests
- [ ] Documentation updates with usage examples and API reference

## Acceptance Criteria

### Must-Have Features
- [x] Single function `htfa.fit()` handles both BIDS and array inputs automatically
- [x] Complete BIDS parsing with subject/task/session filtering capabilities
- [x] Robust error handling with clear, actionable error messages
- [x] Automatic parameter inference with sensible defaults
- [x] Full integration with existing TFA/HTFA core algorithms

### Quality Gates
- [x] All unit and integration tests pass with >95% coverage
- [x] BIDS compliance verified with pybids validation
- [x] Performance benchmarks meet requirements (parsing <30s for typical datasets)
- [x] Documentation includes complete API reference and usage examples
- [x] Mypy type checking passes without errors or ignores

### Definition of Ready for Next Task
- [x] Input detection and parsing infrastructure complete
- [x] BIDS integration fully functional and tested
- [x] Error handling comprehensive and user-friendly
- [x] Ready for HTFAPreprocessor implementation (Task 004)
- [x] All validation framework components in place

Input Detection and BIDS Parsing #76

Description

Task 003: Input Detection and BIDS Parsing

Overview

Problem Statement

Technical Requirements

Core Input Detection

BIDS Integration Implementation

Input Validation Framework

Implementation Details

Input Detection Logic

BIDS Dataset Processing Pipeline

Array Input Processing

Dependencies

External Dependencies

Internal Dependencies

Success Criteria

Functional Requirements

Performance Requirements

Code Quality Requirements

Test Plan

Unit Tests

Integration Tests

Edge Cases

Implementation Notes

BIDS Query Strategy

Parameter Inference Logic

Error Message Standards

Deliverables

Acceptance Criteria

Must-Have Features

Quality Gates

Definition of Ready for Next Task

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions