break up wall of text sections

jt7347 · jt7347 · commit 6204c941291a · 2026-02-10T00:19:53.000-07:00
diff --git a/content/textbook/audits/staging/yuni-wyx-jt7347.mdx b/content/textbook/audits/staging/yuni-wyx-jt7347.mdx
@@ -54,7 +54,8 @@ While ViNT does not explicitly set a hard line between when to explore or when t
    Instead, this encoder encodes the difference between current observation and the goal - just stack observations and goal together, pass through EfficientNet, then flatten to get goal tokens, similar to observation encoder.
    Attention forces goal to attend to / compare with window observation sequence.
 
-2. **Transformer**: $P_{\text{past}}$, $P_{\text{obs}}$ (i.e. current), and $P_{\text{goal}}$ tokens are combined with positional encoding, and passed into decoder-only Transformer backbone (denoted as 'f' in the paper) with 4 multi-headed attention blocks (4 heads, 4 layers), and 2048 hidden units.
+2. **Transformer**: $P_{\text{past}}$, $P_{\text{obs}}$ (i.e. current), and $P_{\text{goal}}$ tokens are combined with positional encoding.
+These are passed into a decoder-only Transformer backbone (denoted as 'f' in the paper) with 4 multi-headed attention blocks (4 heads, 4 layers), and 2048 hidden units.
 
    a. 6 tokens, model dimension of 512, 4 layers, 4 heads, 2048 feed-forward hidden dim, 128 per attention head (512 / 4).
    
@@ -216,7 +217,8 @@ NoMaD was evaluated in 6 different real-world and outdoor environments using a L
 The model was compared with 6 different baselines:
 
 1. **VIB**: Variational Information bottleneck, which models distribution of actions conditioned on observations.
-2. **Masked ViNT**: Essentially ViNT but with goal masking policy. Predicts point estimates of future actions instead of modeling the entire distribution.
+2. **Masked ViNT**: Essentially ViNT but with goal masking policy.
+Predicts point estimates of future actions instead of modeling the entire distribution.
 3. **Autoregressive**: Uses autoregressive prediction over a discrete distribution of actions.
 4. **Subgoal diffusion**: Basically just ViNT (diffusion generation of subgoals with navigation policy).
 5. **Random subgoals**: A variation of ViNT that instead of using diffusion, just randomly samples training data for candidate subgoal.
@@ -231,7 +233,8 @@ In comparison to ViNT, NoMaD presents a new approach to learning exploration and
 The main contribution is architecturally quite simple, but effective.
 The goal-masking effectively turns navigation from a single-task problem into a conditional behavior spectrum, allowing for a more unified end-to-end approach.
 Additionally, the way they used diffusion for only action generation instead of image generation greatly reduces computational costs for running NoMaD.
-Previously in ViNT, exploration vs. exploitation was a behavior encoded within the graph generation and subgoal ranking, but now with NoMaD, the low level collision avoidance and high level planning (exploration vs. subgoal seeking) is defined in one model architecture.
+Previously in ViNT, exploration vs. exploitation was a behavior encoded within the graph generation and subgoal ranking.
+Now with NoMaD, the low level collision avoidance and high level planning (exploration vs. subgoal seeking) is defined in one model architecture.
 Additionally, the probability distribution allows for more fine-tuned assignment of what actions are good and bad in all action space (e.g. high probability on turn left or turn right at a T junction, low probability of going straight and hitting wall).
 
 Some things which could be expanded upon however, include: