ThoughtLink Architecture

Design Decisions

1. Hierarchical Classification (Binary then Direction)

Instead of a flat 5-class classifier, we use a two-stage approach:

Stage 1 (Binary): Rest vs Active -- catches the "no command" state with high confidence (79.4%)
Stage 2 (Direction): Only runs when Stage 1 predicts Active -- 4-class (FORWARD/BACKWARD/LEFT/RIGHT)

This design matters for robot control because:

False triggers (accidentally sending FORWARD when user is resting) are worse than missed commands
The binary gate prevents the direction classifier from running on rest data
Stage 2 sees cleaner training signal since it only trains on active samples

2. 4-Class Direction (Including BACKWARD)

The challenge asks for "left, right, forward, and backward." We map:

Both Fists -> FORWARD
Left Fist -> LEFT
Right Fist -> RIGHT
Tongue Tapping -> BACKWARD

4-class accuracy is 27.9% (random = 25%). This is modest but honest:

Cross-subject generalization with 6 frontal channels is inherently difficult
The two-stage design compensates: Stage 1 (79.4%) handles the critical rest-vs-active gate
Temporal smoothing (92.9% flicker reduction) stabilizes noisy direction estimates
We keep 4-class rather than merging to 3-class because the challenge explicitly requires backward

3. Phase-Aware Detection

Each decoded action is tagged with a phase:

INITIATION: First window where action changes from STOP to active
SUSTAINED: Same action continues
RELEASE: Action changes back to STOP

This addresses the challenge's "phase-aware modeling" direction and enables:

Detecting when a human operator first engages
Tracking sustained intent duration
Detecting operator disengagement

4. Bandpass Filter: 8-30 Hz

We filter to the mu (8-13 Hz) and beta (13-30 Hz) frequency bands because:

Mu rhythm (~10 Hz) is suppressed during motor imagery and execution
Beta rhythm (~20 Hz) shows event-related desynchronization during movement
Frequencies below 8 Hz (delta, theta) contain mostly eye/movement artifacts
Frequencies above 30 Hz contain mostly EMG noise with 6-channel consumer EEG

5. Cross-Subject Validation

All evaluation uses leave-subject-out splits:

No subject appears in both training and test sets
This tests generalization to new users without calibration
Cross-subject is the honest metric: the system works for anyone, not just trained users

6. Temporal Smoothing Over Raw Accuracy

For robot control, command stability > raw accuracy:

Component	Purpose	Effect
MajorityVote(5)	Sliding window vote over 5 recent predictions	Removes transient flips
ConfidenceGate	Threshold-based gating on classifier probabilities	Prevents low-confidence actions
HysteresisFilter(3)	Requires 3 consecutive identical predictions to switch	Prevents oscillation

Combined effect: 92.9% reduction in command flickering.

7. MLP vs Random Forest Comparison

We compared hand-crafted features + RF against raw EEG + neural network:

Model	Stage 1	Stage 2	Latency
RF (69 features)	79.4%	27.9%	13.3ms
MLP (raw 3000)	79.1%	24.3%	0.3ms

RF wins on accuracy because domain-specific PSD features capture motor imagery patterns that raw neural networks struggle to learn cross-subject. MLP is 49x faster but sacrifices direction accuracy.

8. TD-NIRS Exploration

Each .npz file includes feature_moments (72, 40, 3, 2, 3) -- brain blood flow data. Our analysis found:

All 1728 NIRS features have zero variance across recordings
Adding NIRS to EEG hurts accuracy (-1.2% Stage 1, -4.4% Stage 2)
The hemodynamic response is too slow for the task windows or sensors are too far from motor cortex
Conclusion: EEG alone is both necessary and sufficient

9. Temporal Embeddings

PCA projection of per-window features within recordings shows:

Clear separation between rest and active phases in feature space
Trajectories evolve from rest cluster to intent-specific regions
45.5% variance explained by PC1 (rest-vs-active separation)
Phase transitions are visible as trajectory direction changes

10. Feature Engineering

69 features per 1-second window:

24 PSD features: 4 band powers (theta, alpha, beta, alpha/beta ratio) x 6 channels
42 statistical features: 7 stats (variance, MAV, RMS, peak, kurtosis, skewness, zero crossings) x 6 channels
3 cross-channel features: Left-right asymmetry (2) + midline difference (1)

11. Channel Layout

The 6 EEG channels are:

AFF6 (right anterior frontal), AFp2 (right anterior frontopolar)
AFp1 (left anterior frontopolar), AFF5 (left anterior frontal)
FCz (midline frontocentral), CPz (midline centroparietal)

4 of 6 channels are frontal -- not over motor cortex. FCz and CPz are closest to motor areas and dominate feature importance.

12. Scalability: Hierarchical Group Command Architecture

Scaling from 1 robot to 100+ requires more than a fast decoder. We designed a three-layer command dispatch architecture for the 100-robot fleet (10 groups of 10):

Layer 1: HUMAN INTENT (BCI Pipeline)
  One human, one BCI decode per decision cycle (~26ms)
  Output: a single action (FORWARD / LEFT / RIGHT / BACKWARD / STOP)
  Or: a trigger signal for context-aware dispatch

Layer 2: GROUP ROUTER
  100 robots organized into 10 groups (G1-G10)
  Operator targets a group; system identifies stuck robots within it
  Command types:
    - Individual override: one action → one robot
    - Group direction: one action → all stuck robots in group
    - Group fix: one trigger → context-aware individualized actions
    - Fix all: one trigger → all stuck robots fleet-wide

Layer 3: CONTEXT AI (Per-Robot Diagnosis)
  Each stuck robot has a diagnosed failure reason
  System maps failure → corrective action automatically:
    - obstacle_left  → RIGHT
    - obstacle_right → LEFT
    - lost_target    → FORWARD
    - failed_task    → BACKWARD
    - unknown        → STOP
  Human provides strategic oversight; system handles tactical execution

Three command layers in the demo:

Layer	Command Example	Robots Affected	Actions Sent
Layer 1	Click "LEFT FIST" button	1 stuck robot	1 identical action
Layer 2	`group 3 left`	All stuck in G3	N identical actions
Layer 3	`group 3 fix` / `fix all`	All stuck in G3 / fleet	N individualized actions

Context-aware individualization (Layer 3): When the operator sends group 3 fix, the system doesn't send the same action to every robot. It diagnoses each robot's failure reason and prescribes the correct corrective action. Robot #22 has an obstacle on its left → gets RIGHT. Robot #25 lost its target → gets FORWARD. One command, multiple robots, each getting the right fix.

Efficiency tracking: The demo tracks operator leverage = total robots overridden / total commands issued. With Layer 3 commands, a single fix all can resolve 8+ robots simultaneously, yielding 8x+ leverage.

Impact on operator efficiency:

Dispatch Strategy	Robots/Operator	Use Case
Individual (Layer 1)	1	Precise single-robot control
Group direction (Layer 2)	10-50	Uniform group corrections
Context-aware fix (Layer 3)	50-100+	Heterogeneous failures, max leverage
Fix all (Layer 3)	Unlimited	Fleet-wide recovery

Why this scales linearly: The BCI decoder runs once per decision cycle regardless of fleet size. The group router is O(N) in the number of robots but operates on pre-decoded intent — no additional EEG processing. At 26ms per robot, a single pipeline instance processes ~40 robots/second. An 8-core machine handles 300+ robots at 1 Hz.

Current demo implementation: demo/full_demo.py runs 100 robots in 10 groups with all three command layers. demo/scalability_demo.py benchmarks 10/50/100 robot fleets with sequential decode timing.

Limitations

6 channels vs 64+: Research BCI systems use 64-256 channels. Our 6-channel system has limited spatial resolution.
Frontal-heavy montage: Motor imagery signals are strongest at C3/C4, which are not measured.
Cross-subject difficulty: Without per-user calibration, accuracy is fundamentally limited.
4-class direction is modest: 27.9% (vs 25% random) is above chance but not practically reliable for fine control.
Offline processing: We process pre-recorded .npz files, not live EEG streams.

Data Flow

.npz file
  -> feature_eeg (7499, 6) at 500 Hz
  -> bandpass_filter 8-30 Hz
  -> extract active segment (samples 1500 to 1500+duration*500)
  -> normalize per channel (zero-mean, unit-variance)
  -> segment 1s windows, 0.5s overlap
  -> extract 69 features per window
  -> Stage 1: RandomForest -> P(active)
  -> if active: Stage 2: SVM -> direction class (4-class)
  -> ConfidenceGate -> raw action string
  -> MajorityVote(5) -> smoothed action
  -> Hysteresis(3) -> final action
  -> Phase detection (INITIATION / SUSTAINED / RELEASE)
  -> BRI Controller.set_action(Action.FORWARD/BACKWARD/LEFT/RIGHT/STOP)
  -> G1 humanoid moves in MuJoCo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThoughtLink Architecture

Design Decisions

1. Hierarchical Classification (Binary then Direction)

2. 4-Class Direction (Including BACKWARD)

3. Phase-Aware Detection

4. Bandpass Filter: 8-30 Hz

5. Cross-Subject Validation

6. Temporal Smoothing Over Raw Accuracy

7. MLP vs Random Forest Comparison

8. TD-NIRS Exploration

9. Temporal Embeddings

10. Feature Engineering

11. Channel Layout

12. Scalability: Hierarchical Group Command Architecture

Limitations

Data Flow

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

ThoughtLink Architecture

Design Decisions

1. Hierarchical Classification (Binary then Direction)

2. 4-Class Direction (Including BACKWARD)

3. Phase-Aware Detection

4. Bandpass Filter: 8-30 Hz

5. Cross-Subject Validation

6. Temporal Smoothing Over Raw Accuracy

7. MLP vs Random Forest Comparison

8. TD-NIRS Exploration

9. Temporal Embeddings

10. Feature Engineering

11. Channel Layout

12. Scalability: Hierarchical Group Command Architecture

Limitations

Data Flow