|
| 1 | +# NOWAIT Keywords Reference |
| 2 | + |
| 3 | +Complete reference for reflection keywords used in the NOWAIT technique. |
| 4 | + |
| 5 | +## Primary Keywords (from paper) |
| 6 | + |
| 7 | +These keywords were empirically identified from 32 independent runs of QwQ-32B on AIME 2025, using `\n\n` as delimiters to identify the 15 most frequent monolingual transition words. |
| 8 | + |
| 9 | +### Core Suppression List |
| 10 | + |
| 11 | +```python |
| 12 | +KEYWORDS = [ |
| 13 | + "wait", # Most common reflection trigger |
| 14 | + "alternatively", # Indicates exploring different approach |
| 15 | + "hmm", # Hesitation marker |
| 16 | + "but", # Contradiction/reconsideration |
| 17 | + "however", # Contradiction/reconsideration |
| 18 | + "alternative", # Exploring options |
| 19 | + "another", # Switching approach |
| 20 | + "check", # Verification trigger |
| 21 | + "double-check", # Re-verification |
| 22 | + "oh", # Realization marker |
| 23 | + "maybe", # Uncertainty/reconsideration |
| 24 | + "verify", # Verification trigger |
| 25 | + "other", # Exploring alternatives |
| 26 | + "again", # Repetition/re-check |
| 27 | + "now", # Transition marker |
| 28 | + "ah", # Realization marker |
| 29 | + "any", # Exploring possibilities |
| 30 | +] |
| 31 | +``` |
| 32 | + |
| 33 | +## Excluded Patterns |
| 34 | + |
| 35 | +These patterns should NOT be suppressed as they are false positives: |
| 36 | + |
| 37 | +```python |
| 38 | +EXCLUDED = [ |
| 39 | + "ohio", # Contains "oh" but is a proper noun |
| 40 | + "butane", # Contains "but" but is a chemical |
| 41 | + "button", # Contains "but" but is a UI element |
| 42 | + "butterfly", # Contains "but" but is a noun |
| 43 | + "checkout", # Contains "check" but is a noun/verb |
| 44 | + "checksum", # Contains "check" but is technical term |
| 45 | + "another's", # Possessive form, often necessary |
| 46 | +] |
| 47 | +``` |
| 48 | + |
| 49 | +## Token Expansion |
| 50 | + |
| 51 | +For each keyword, the processor expands to all vocabulary variants: |
| 52 | + |
| 53 | +| Keyword | Expanded Variants | |
| 54 | +|---------|-------------------| |
| 55 | +| wait | wait, Wait, WAIT, " wait", " Wait", ".wait", ",wait", etc. | |
| 56 | +| hmm | hmm, Hmm, HMM, " hmm", "...hmm", etc. | |
| 57 | +| alternatively | alternatively, Alternatively, " Alternatively", etc. | |
| 58 | + |
| 59 | +## Model-Specific Tuning |
| 60 | + |
| 61 | +Different models may benefit from adjusted keyword lists: |
| 62 | + |
| 63 | +### QwQ-32B / DeepSeek-R1 |
| 64 | +- Use full default list |
| 65 | +- High reduction potential (30%+) |
| 66 | + |
| 67 | +### Phi4-Reasoning-Plus |
| 68 | +- Use full default list |
| 69 | +- Consider adding: "let me think", "I wonder" |
| 70 | + |
| 71 | +### Kimi-VL (Multimodal) |
| 72 | +- Use full default list |
| 73 | +- Very high reduction (40-60%) |
| 74 | +- May need domain-specific additions for visual tasks |
| 75 | + |
| 76 | +### Qwen3 Series |
| 77 | +- RL-based (32B): Use full list |
| 78 | +- Distilled (4B/8B/14B): Consider removing "but", "however" to preserve some reasoning flow |
| 79 | + |
| 80 | +## Keyword Categories |
| 81 | + |
| 82 | +### Self-Reflection Markers |
| 83 | +- `wait`, `hmm`, `oh`, `ah` |
| 84 | +- Signal: Model is pausing to reconsider |
| 85 | + |
| 86 | +### Verification Triggers |
| 87 | +- `check`, `double-check`, `verify` |
| 88 | +- Signal: Model is validating previous work |
| 89 | + |
| 90 | +### Alternative Exploration |
| 91 | +- `alternatively`, `alternative`, `another`, `other` |
| 92 | +- Signal: Model is exploring different approaches |
| 93 | + |
| 94 | +### Contradiction/Reconsideration |
| 95 | +- `but`, `however`, `maybe` |
| 96 | +- Signal: Model is reconsidering previous conclusion |
| 97 | + |
| 98 | +### Transition Markers |
| 99 | +- `now`, `again`, `any` |
| 100 | +- Signal: Model is shifting focus or repeating |
| 101 | + |
| 102 | +## Benchmark Results by Keyword Removal |
| 103 | + |
| 104 | +| Keywords Removed | AIME 2025 ACC | Token Reduction | |
| 105 | +|-----------------|---------------|-----------------| |
| 106 | +| None (baseline) | 66.67% | 0% | |
| 107 | +| wait only | 67.33% | 15% | |
| 108 | +| wait + hmm | 67.67% | 22% | |
| 109 | +| All 17 keywords | 68.00% | 31% | |
| 110 | + |
| 111 | +## Implementation Notes |
| 112 | + |
| 113 | +### Logit Suppression Value |
| 114 | +- Default: `-1e10` (effectively negative infinity) |
| 115 | +- Alternative: `-100` (softer suppression, allows rare occurrence) |
| 116 | + |
| 117 | +### Vocabulary Iteration |
| 118 | +```python |
| 119 | +def build_suppressed_tokens(tokenizer, keywords): |
| 120 | + suppressed = set() |
| 121 | + vocab = tokenizer.get_vocab() |
| 122 | + |
| 123 | + for token_text, token_id in vocab.items(): |
| 124 | + for keyword in keywords: |
| 125 | + if keyword.lower() in token_text.lower(): |
| 126 | + suppressed.add(token_id) |
| 127 | + break |
| 128 | + |
| 129 | + return suppressed |
| 130 | +``` |
| 131 | + |
| 132 | +### Performance Considerations |
| 133 | +- Token set is built once at initialization |
| 134 | +- Lookup is O(1) per token during generation |
| 135 | +- Memory overhead: ~few KB for token ID set |
0 commit comments