Skip to content

Latest commit

 

History

History
194 lines (145 loc) · 9.16 KB

File metadata and controls

194 lines (145 loc) · 9.16 KB

ThoughtLink Architecture

Design Decisions

1. Hierarchical Classification (Binary then Direction)

Instead of a flat 5-class classifier, we use a two-stage approach:

  • Stage 1 (Binary): Rest vs Active -- catches the "no command" state with high confidence (79.4%)
  • Stage 2 (Direction): Only runs when Stage 1 predicts Active -- 4-class (FORWARD/BACKWARD/LEFT/RIGHT)

This design matters for robot control because:

  • False triggers (accidentally sending FORWARD when user is resting) are worse than missed commands
  • The binary gate prevents the direction classifier from running on rest data
  • Stage 2 sees cleaner training signal since it only trains on active samples

2. 4-Class Direction (Including BACKWARD)

The challenge asks for "left, right, forward, and backward." We map:

  • Both Fists -> FORWARD
  • Left Fist -> LEFT
  • Right Fist -> RIGHT
  • Tongue Tapping -> BACKWARD

4-class accuracy is 27.9% (random = 25%). This is modest but honest:

  • Cross-subject generalization with 6 frontal channels is inherently difficult
  • The two-stage design compensates: Stage 1 (79.4%) handles the critical rest-vs-active gate
  • Temporal smoothing (92.9% flicker reduction) stabilizes noisy direction estimates
  • We keep 4-class rather than merging to 3-class because the challenge explicitly requires backward

3. Phase-Aware Detection

Each decoded action is tagged with a phase:

  • INITIATION: First window where action changes from STOP to active
  • SUSTAINED: Same action continues
  • RELEASE: Action changes back to STOP

This addresses the challenge's "phase-aware modeling" direction and enables:

  • Detecting when a human operator first engages
  • Tracking sustained intent duration
  • Detecting operator disengagement

4. Bandpass Filter: 8-30 Hz

We filter to the mu (8-13 Hz) and beta (13-30 Hz) frequency bands because:

  • Mu rhythm (~10 Hz) is suppressed during motor imagery and execution
  • Beta rhythm (~20 Hz) shows event-related desynchronization during movement
  • Frequencies below 8 Hz (delta, theta) contain mostly eye/movement artifacts
  • Frequencies above 30 Hz contain mostly EMG noise with 6-channel consumer EEG

5. Cross-Subject Validation

All evaluation uses leave-subject-out splits:

  • No subject appears in both training and test sets
  • This tests generalization to new users without calibration
  • Cross-subject is the honest metric: the system works for anyone, not just trained users

6. Temporal Smoothing Over Raw Accuracy

For robot control, command stability > raw accuracy:

Component Purpose Effect
MajorityVote(5) Sliding window vote over 5 recent predictions Removes transient flips
ConfidenceGate Threshold-based gating on classifier probabilities Prevents low-confidence actions
HysteresisFilter(3) Requires 3 consecutive identical predictions to switch Prevents oscillation

Combined effect: 92.9% reduction in command flickering.

7. MLP vs Random Forest Comparison

We compared hand-crafted features + RF against raw EEG + neural network:

Model Stage 1 Stage 2 Latency
RF (69 features) 79.4% 27.9% 13.3ms
MLP (raw 3000) 79.1% 24.3% 0.3ms

RF wins on accuracy because domain-specific PSD features capture motor imagery patterns that raw neural networks struggle to learn cross-subject. MLP is 49x faster but sacrifices direction accuracy.

8. TD-NIRS Exploration

Each .npz file includes feature_moments (72, 40, 3, 2, 3) -- brain blood flow data. Our analysis found:

  • All 1728 NIRS features have zero variance across recordings
  • Adding NIRS to EEG hurts accuracy (-1.2% Stage 1, -4.4% Stage 2)
  • The hemodynamic response is too slow for the task windows or sensors are too far from motor cortex
  • Conclusion: EEG alone is both necessary and sufficient

9. Temporal Embeddings

PCA projection of per-window features within recordings shows:

  • Clear separation between rest and active phases in feature space
  • Trajectories evolve from rest cluster to intent-specific regions
  • 45.5% variance explained by PC1 (rest-vs-active separation)
  • Phase transitions are visible as trajectory direction changes

10. Feature Engineering

69 features per 1-second window:

  • 24 PSD features: 4 band powers (theta, alpha, beta, alpha/beta ratio) x 6 channels
  • 42 statistical features: 7 stats (variance, MAV, RMS, peak, kurtosis, skewness, zero crossings) x 6 channels
  • 3 cross-channel features: Left-right asymmetry (2) + midline difference (1)

11. Channel Layout

The 6 EEG channels are:

  • AFF6 (right anterior frontal), AFp2 (right anterior frontopolar)
  • AFp1 (left anterior frontopolar), AFF5 (left anterior frontal)
  • FCz (midline frontocentral), CPz (midline centroparietal)

4 of 6 channels are frontal -- not over motor cortex. FCz and CPz are closest to motor areas and dominate feature importance.

12. Scalability: Hierarchical Group Command Architecture

Scaling from 1 robot to 100+ requires more than a fast decoder. We designed a three-layer command dispatch architecture for the 100-robot fleet (10 groups of 10):

Layer 1: HUMAN INTENT (BCI Pipeline)
  One human, one BCI decode per decision cycle (~26ms)
  Output: a single action (FORWARD / LEFT / RIGHT / BACKWARD / STOP)
  Or: a trigger signal for context-aware dispatch

Layer 2: GROUP ROUTER
  100 robots organized into 10 groups (G1-G10)
  Operator targets a group; system identifies stuck robots within it
  Command types:
    - Individual override: one action → one robot
    - Group direction: one action → all stuck robots in group
    - Group fix: one trigger → context-aware individualized actions
    - Fix all: one trigger → all stuck robots fleet-wide

Layer 3: CONTEXT AI (Per-Robot Diagnosis)
  Each stuck robot has a diagnosed failure reason
  System maps failure → corrective action automatically:
    - obstacle_left  → RIGHT
    - obstacle_right → LEFT
    - lost_target    → FORWARD
    - failed_task    → BACKWARD
    - unknown        → STOP
  Human provides strategic oversight; system handles tactical execution

Three command layers in the demo:

Layer Command Example Robots Affected Actions Sent
Layer 1 Click "LEFT FIST" button 1 stuck robot 1 identical action
Layer 2 group 3 left All stuck in G3 N identical actions
Layer 3 group 3 fix / fix all All stuck in G3 / fleet N individualized actions

Context-aware individualization (Layer 3): When the operator sends group 3 fix, the system doesn't send the same action to every robot. It diagnoses each robot's failure reason and prescribes the correct corrective action. Robot #22 has an obstacle on its left → gets RIGHT. Robot #25 lost its target → gets FORWARD. One command, multiple robots, each getting the right fix.

Efficiency tracking: The demo tracks operator leverage = total robots overridden / total commands issued. With Layer 3 commands, a single fix all can resolve 8+ robots simultaneously, yielding 8x+ leverage.

Impact on operator efficiency:

Dispatch Strategy Robots/Operator Use Case
Individual (Layer 1) 1 Precise single-robot control
Group direction (Layer 2) 10-50 Uniform group corrections
Context-aware fix (Layer 3) 50-100+ Heterogeneous failures, max leverage
Fix all (Layer 3) Unlimited Fleet-wide recovery

Why this scales linearly: The BCI decoder runs once per decision cycle regardless of fleet size. The group router is O(N) in the number of robots but operates on pre-decoded intent — no additional EEG processing. At 26ms per robot, a single pipeline instance processes ~40 robots/second. An 8-core machine handles 300+ robots at 1 Hz.

Current demo implementation: demo/full_demo.py runs 100 robots in 10 groups with all three command layers. demo/scalability_demo.py benchmarks 10/50/100 robot fleets with sequential decode timing.

Limitations

  1. 6 channels vs 64+: Research BCI systems use 64-256 channels. Our 6-channel system has limited spatial resolution.
  2. Frontal-heavy montage: Motor imagery signals are strongest at C3/C4, which are not measured.
  3. Cross-subject difficulty: Without per-user calibration, accuracy is fundamentally limited.
  4. 4-class direction is modest: 27.9% (vs 25% random) is above chance but not practically reliable for fine control.
  5. Offline processing: We process pre-recorded .npz files, not live EEG streams.

Data Flow

.npz file
  -> feature_eeg (7499, 6) at 500 Hz
  -> bandpass_filter 8-30 Hz
  -> extract active segment (samples 1500 to 1500+duration*500)
  -> normalize per channel (zero-mean, unit-variance)
  -> segment 1s windows, 0.5s overlap
  -> extract 69 features per window
  -> Stage 1: RandomForest -> P(active)
  -> if active: Stage 2: SVM -> direction class (4-class)
  -> ConfidenceGate -> raw action string
  -> MajorityVote(5) -> smoothed action
  -> Hysteresis(3) -> final action
  -> Phase detection (INITIATION / SUSTAINED / RELEASE)
  -> BRI Controller.set_action(Action.FORWARD/BACKWARD/LEFT/RIGHT/STOP)
  -> G1 humanoid moves in MuJoCo