Skip to content

Conversation

@pauladkisson
Copy link
Collaborator

@pauladkisson pauladkisson commented Nov 20, 2025

Recording Extractors Architecture

Fixes #170

Overview

This refactor replaces monolithic format detection and reading logic with a modular extractor architecture. Each data format (TDT, Doric, CSV, NPM) now has its own dedicated class implementing a common interface.

Benefits: Modularity, extensibility for new formats, consistent API, and isolated testability.

Architecture

classDiagram
    class BaseRecordingExtractor {
        <>
        +discover_events_and_flags()* tuple~list, list~
        +read(events, outputPath)* list~dict~
        +save(output_dicts, outputPath)* None
        #_write_hdf5(data, storename, output_path, key) None
    }
    
    class TdtRecordingExtractor
    class DoricRecordingExtractor
    class CsvRecordingExtractor
    class NpmRecordingExtractor
    
    BaseRecordingExtractor <|-- TdtRecordingExtractor
    BaseRecordingExtractor <|-- DoricRecordingExtractor
    BaseRecordingExtractor <|-- CsvRecordingExtractor
    CsvRecordingExtractor <|-- NpmRecordingExtractor
Loading

API Contract

All extractors implement three methods:

Method Purpose
discover_events_and_flags() Class method to find available events in data files
read(*, events, outputPath) Extract data for specified events → returns list of dicts
save(*, output_dicts, outputPath) Write extracted data to HDF5

Note: discover_events_and_flags() has a flexible signature—NPM requires additional num_ch and inputParameters arguments for channel configuration.

NPM Configuration Pattern: Tkinter GUI code has been moved out of the extractor and into saveStoresList.py. The extractor provides helper methods (has_multiple_event_ttls(), needs_ts_unit()) to determine what configuration is needed, while the GUI layer collects user input and passes it to discover_events_and_flags() via inputParameters. This keeps the extractor free of GUI dependencies.

Pipeline Integration

  1. Step 2 (saveStoresList.py): Calls discover_events_and_flags() to find events, presents GUI for user to create friendly name mappings → outputs storesList.csv

  2. Step 3 (readTevTsq.py): Creates appropriate extractor, reads storesList.csv for event list, processes all events in parallel via read_and_save_all_events() → outputs HDF5 files

Doric note: Uses storesList.csv to build the required event_name_to_event_type mapping.

Data Flow

flowchart TB
    A[Raw Data Files] --> B[Step 2: saveStoresList.py]
    B --> C[discover_events_and_flags]
    C --> D[GUI: User Maps Events]
    D --> E[storesList.csv]
    E --> F[Step 3: readTevTsq.py]
    A --> F
    F --> G[Create Extractor]
    G --> H[read_and_save_all_events]
    H --> I[HDF5 Files]
Loading

pauladkisson and others added 23 commits November 21, 2025 12:37
…s into the base_recording_extractor and removed all duplicates.
@pauladkisson pauladkisson changed the base branch from modularization to dev December 4, 2025 02:04
@pauladkisson pauladkisson marked this pull request as ready for review December 4, 2025 02:33
@pauladkisson pauladkisson mentioned this pull request Dec 15, 2025
6 tasks
@pauladkisson pauladkisson merged commit b58994d into dev Dec 17, 2025
17 checks passed
@pauladkisson pauladkisson deleted the extractor branch December 17, 2025 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants