You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Status:** Open | **Priority:** High | **Opened**: 2026/02/21
66
+
67
+
**Description:**
68
+
69
+
We currently lack explainable AI (XAI) tooling to verify that the agent's distinct sensor cortices (Visual, Depth, Thermal) are specializing as intended. We need visual proof that the Visual Cortex focuses on static geometry while the Thermal Cortex tracks dynamic threats.
70
+
71
+
**Proposed Solution:**
72
+
73
+
Integrate the `captum` library to generate Gradient-weighted Class Activation Mapping (Grad-CAM) heatmaps. Create an `examine` command that takes a single sequence from the dataset, runs `captum.attr.LayerGradCam` on the final convolutional layers of the respective cortices, and saves the upsampled heatmaps as side-by-side `.png` files. This will allow us to physically view the spatial stimuli responsible for triggering specific action logits.
Copy file name to clipboardExpand all lines: docs/board/issues.md
+2-14Lines changed: 2 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ The `DoomStreamingDataset` currently applies dynamic NumPy transposition, mirror
14
14
15
15
Refactor the `DataLoader` initialization in `train.py` to offload ETL transformations to background processes. Implement `num_workers` (e.g., 4), enable `pin_memory=True` for faster Host-to-Device memory transfers, and establish a `prefetch_factor`.
**Status:** Open | **Priority:** Medium | **Opened**: 2026/02/21
20
20
@@ -26,19 +26,7 @@ In `dataset.py`, `DoomStreamingDataset` currently iterates through all `.npz` fi
26
26
27
27
Migrate the storage backend from compressed `.npz` archives to HDF5 (`h5py`) format, or utilize NumPy's `mmap_mode='r'` to memory-map the data on disk. This allows the `Dataset` to lazily stream tensor blocks directly from the NVMe/SSD without pre-loading the entire corpus into volatile memory.
**Status:** Open | **Priority:** High | **Opened**: 2026/02/21
32
-
33
-
**Description:**
34
-
35
-
We currently lack explainable AI (XAI) tooling to verify that the agent's distinct sensor cortices (Visual, Depth, Thermal) are specializing as intended. We need visual proof that the Visual Cortex focuses on static geometry while the Thermal Cortex tracks dynamic threats.
36
-
37
-
**Proposed Solution:**
38
-
39
-
Integrate the `captum` library to generate Gradient-weighted Class Activation Mapping (Grad-CAM) heatmaps. Create an `examine` command that takes a single sequence from the dataset, runs `captum.attr.LayerGradCam` on the final convolutional layers of the respective cortices, and saves the upsampled heatmaps as side-by-side `.png` files. This will allow us to physically view the spatial stimuli responsible for triggering specific action logits.
Copy file name to clipboardExpand all lines: docs/board/roadmap.md
+30Lines changed: 30 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,6 +138,36 @@ See [Phase Archive](./closed/phases.md) for the project's completed phases.
138
138
139
139
**Assessment**: Acceptable, if gated behind a configuration property that is disabled by default. Cleared to implement.
140
140
141
+
## Phase 10: Cortical Auxiliary Heads (Isolated Representation Learning) ان
142
+
143
+
*Goal:* Attach secondary linear heads directly to the latent output vectors of specific cortices (e.g., Thermal, Visual) prior to sensorimotor concatenation. This enables the application of targeted, isolated loss functions (e.g., BCE for enemy counting on the thermal mask) directly to the sub-networks, accelerating feature extraction without waiting for the slow, end-to-end action gradient.
144
+
145
+
### 1. Configuration Layer
146
+
147
+
*[ ]**Auxiliary Toggles:** Update `app.yaml` to include an `auxiliary_heads` configuration block under `brain` (e.g., toggling thermal enemy counting) and corresponding $\lambda$ weights under the `loss` block.
148
+
*[ ]**State Validation:** Update `config.py` to parse the new auxiliary settings and loss weights during pipeline initialization.
149
+
150
+
### 2. The Brain (Architecture Redesign)
151
+
152
+
*[ ]**Secondary Linear Heads:** Modify `DoomLiquidNet` in `brain.py` to conditionally instantiate `nn.Linear` layers branching directly off the flattened cortical vectors (e.g., $T(t)$ or $V(t)$).
153
+
*[ ]**Multi-Output Forward Pass:** Update the `forward` method to return a dictionary of auxiliary predictions alongside the primary action logits and the recurrent hidden state.
154
+
155
+
### 3. The Pipeline (ETL & Training)
156
+
157
+
*[ ]**Ground Truth Extraction:** Update `record.py` and `DoomStreamingDataset` to extract, store, and stream the necessary ground truth labels for the auxiliary tasks (e.g., parsing the exact number of visible enemies from ViZDoom's underlying `state` variables).
158
+
*[ ]**Composite Objective Function:** Modify the optimization loop in `train.py` to compute and sum the isolated losses against the main behavioral cloning target: $\mathcal{L}_{Total} = \mathcal{L}_{Action} + \lambda \mathcal{L}_{Aux\_Thermal} + \dots$
159
+
160
+
!!! danger "Risk Assessment"
161
+
**Training Overhead: Moderate**
162
+
163
+
Optimizing a composite loss function requires calculating gradients for both the primary classification head and the auxiliary heads simultaneously. However, the additional parameters (small linear layers) are mathematically trivial compared to the deep CNNs. The primary overhead is I/O related: modifying the dataset to extract and stream additional ground truth labels from the engine state.
164
+
165
+
**Runtime Overhead: Zero**
166
+
167
+
This is a purely structural training enhancement. Because the agent only requires the output of the primary Motor Cortex to play the game, the auxiliary heads can be completely detached and bypassed during live inference, maintaining strict temporal compliance with the $35\text{Hz}$ engine loop.
168
+
169
+
**Assessment**: High reward, zero runtime risk. Cleared to implement.
0 commit comments