-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
duplicateThis issue or pull request already existsThis issue or pull request already exists
Description
Epic: Initial HTFA Implementation
Overview
Create a complete, production-ready HTFA toolbox with automatic BIDS dataset processing, core TFA/HTFA algorithms, rich visualization capabilities, and comprehensive validation through synthetic data testing. The implementation follows scikit-learn patterns and provides both high-level (htfa.fit()) and low-level (TFA.fit(), HTFA.fit()) APIs.
Architecture Decisions
Core Algorithm Design
- Optimization Framework: SciPy's non-linear least squares for factor estimation, ridge regression for weights
- Initialization Strategy: K-means clustering of spatial coordinates for robust starting points
- Convergence Detection: Parameter change monitoring with configurable tolerance
- Multi-subject Handling: Hierarchical optimization with global template and factor matching
API Design Strategy
- Input Detection: Automatic detection of BIDS directories vs NumPy arrays in
.fit()method - Interface Pattern: Scikit-learn BaseEstimator for consistency with ML ecosystem
- Results Container: Rich HTFAResults class with built-in visualization and export
- Error Handling: Comprehensive validation with clear, actionable error messages
Data Pipeline Architecture
- BIDS Integration: pybids for dataset parsing, nilearn for preprocessing
- Preprocessing: Configurable pipeline with sensible defaults
- Output Format: BIDS derivatives specification compliance
- Validation Strategy: Synthetic data generation using HTFA generative process
Technical Approach
Core Algorithm Components
- TFA Class: Single-subject spatial factor analysis with iterative optimization
- HTFA Class: Multi-subject hierarchical analysis with global template computation
- Initialization Module: K-means clustering and parameter initialization utilities
- Optimization Engine: Robust numerical optimization with convergence monitoring
BIDS Integration Layer
- Input Parser: Automatic detection and validation of BIDS vs array inputs
- Preprocessing Pipeline: HTFAPreprocessor class with configurable steps
- Metadata Handling: Preserve and propagate BIDS metadata through analysis
- Output Writer: BIDS derivatives-compliant results export
Visualization and Results
- HTFAResults Container: Comprehensive results storage with metadata
- Brain Plotting: Leverage nilearn for professional brain visualizations
- Time Series Plots: matplotlib/seaborn for temporal weight visualization
- Export Functions: NIfTI reconstruction and BIDS derivatives output
Testing and Validation Framework
- Synthetic Data Generator: HTFA generative process implementation
- Parameter Recovery Tests: Validate algorithm accuracy on known ground truth
- BIDS Test Datasets: Synthetic BIDS-formatted data for integration testing
- Performance Benchmarking: Runtime and memory usage measurement
Implementation Strategy
Development Approach
- Core-First: Implement and validate algorithms before integration layers
- Test-Driven: Synthetic data validation drives correctness verification
- Incremental Integration: Add BIDS support after core algorithms are stable
- Modular Design: Clear separation between algorithms, preprocessing, and visualization
Risk Mitigation Strategy
- Algorithm Validation: Extensive testing with synthetic data before real data
- Performance Monitoring: Early profiling to identify optimization needs
- API Evolution: Design for extensibility without breaking changes
- Error Recovery: Comprehensive input validation and graceful failure handling
Quality Assurance
- Continuous Testing: >90% coverage with synthetic and edge case testing
- Type Safety: Full mypy compliance for algorithm correctness
- Performance Baselines: Establish benchmarks for future optimization
- Documentation Standards: Google-style docstrings with usage examples
Task Breakdown Preview
High-level task categories that will be created:
- Core TFA Algorithm: K-means initialization, non-linear optimization, convergence detection
- Hierarchical HTFA Algorithm: Multi-subject optimization, global template, factor matching
- Input Detection and BIDS Integration: Automatic parsing, validation, preprocessing pipeline
- HTFAResults and Visualization: Results container, brain plotting, export functionality
- Synthetic Data Generation: HTFA generative process, BIDS formatting, parameter recovery
- API Integration and Polish: High-level API, error handling, documentation
- Performance Validation: Benchmarking, memory profiling, optimization identification
- Testing Infrastructure: Comprehensive test suite, CI integration, quality gates
Dependencies
External Package Dependencies
- NumPy/SciPy: Core numerical computation and optimization
- scikit-learn: BaseEstimator interface and clustering algorithms
- nilearn: Neuroimaging preprocessing, visualization, and NIfTI handling
- pybids: BIDS dataset parsing and validation
- matplotlib/seaborn: Plotting and visualization
- pandas: Tabular data handling for metadata
Internal Codebase Dependencies
- Existing Package Structure: Build upon current htfa/ directory layout
- Poetry Configuration: Extend current dependency management
- Testing Framework: Enhance existing pytest infrastructure
- Linting Setup: Use configured black, mypy, and quality tools
Research and Validation Dependencies
- Technical Design Document: Algorithm specifications and implementation guidance
- HTFA Mathematical Foundation: Generative process for synthetic data creation
- Scikit-learn Patterns: API consistency and interface design
- BIDS Specification: Output format compliance and metadata handling
Success Criteria (Technical)
Algorithm Correctness
- Parameter Recovery: >95% accuracy on synthetic datasets across noise levels
- Convergence Reliability: Stable optimization for >99% of valid input datasets
- Numerical Stability: Robust performance with ill-conditioned data matrices
- Multi-subject Consistency: Factor alignment and global template accuracy
Performance Benchmarks
- Runtime Efficiency: Analysis time within 2x of BrainIAK baseline
- Memory Efficiency: Linear scaling with dataset size, handle 100+ subjects
- Preprocessing Speed: BIDS dataset loading and preprocessing < 25% of total runtime
- Visualization Response: Plot generation < 5 seconds for typical factors
Code Quality Metrics
- Test Coverage: >90% line coverage across all modules
- Type Safety: 100% mypy compliance with no type ignores
- Linting Compliance: Pass all black, isort, and style checks
- Documentation Coverage: Google-style docstrings for all public APIs
User Experience Benchmarks
- Installation Time: Complete setup in < 2 minutes on clean environment
- API Learning Curve: Single-line analysis without prior knowledge
- Error Clarity: Self-explanatory error messages with actionable solutions
- Result Accessibility: Publication-ready visualizations without configuration
Estimated Effort
Overall Timeline
- Core Development: 3-4 weeks of focused implementation
- Integration and Testing: 1-2 weeks of comprehensive validation
- Polish and Documentation: 1 week of API refinement and docs
Resource Requirements
- Primary Developer: 1 full-time equivalent for algorithm implementation
- Testing Support: 0.5 FTE for synthetic data generation and validation
- Integration Expertise: 0.25 FTE for BIDS specification compliance
Critical Path Items
- TFA Algorithm Implementation: Foundation for all other components
- Synthetic Data Generation: Required for comprehensive testing
- HTFA Hierarchical Optimization: Most complex algorithmic component
- BIDS Integration: Essential for user adoption and workflow integration
Stats
Total tasks: 8
Parallel tasks: 3 (can be worked on simultaneously)
Sequential tasks: 5 (have dependencies)
Estimated total effort: 14-19 days (275 hours)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
duplicateThis issue or pull request already existsThis issue or pull request already exists