|
| 1 | +# Game Theory DP - Intuition Guide |
| 2 | + |
| 3 | +## The Mental Model: Think Like Your Opponent |
| 4 | + |
| 5 | +Imagine playing chess: |
| 6 | +- You don't just think about your next move |
| 7 | +- You think: "If I do this, what will my opponent do? And then what's my best response?" |
| 8 | + |
| 9 | +**Game Theory DP formalizes this recursive reasoning.** |
| 10 | + |
| 11 | +## The Core Insight: Minimax |
| 12 | + |
| 13 | +At any game state, the current player faces a choice: |
| 14 | + |
| 15 | +``` |
| 16 | +I want to MAXIMIZE my outcome |
| 17 | +But I know my opponent will MINIMIZE my outcome on their turn |
| 18 | +So I must choose the move that gives me the best result |
| 19 | +ASSUMING my opponent plays perfectly |
| 20 | +``` |
| 21 | + |
| 22 | +This is the **minimax principle**: maximize your minimum guaranteed outcome. |
| 23 | + |
| 24 | +## Why "Score Difference" Works |
| 25 | + |
| 26 | +Consider a game where we track score difference from the current player's perspective: |
| 27 | + |
| 28 | +``` |
| 29 | +If I'm ahead by 5 points, that's +5 for me |
| 30 | +When my opponent plays, THEIR +3 becomes MY -3 |
| 31 | +So the scores naturally flip with each turn |
| 32 | +
|
| 33 | +dp(state) = my_gain - dp(next_state) |
| 34 | + ↑ |
| 35 | + This is opponent's score difference |
| 36 | + (positive for them = negative for me) |
| 37 | +``` |
| 38 | + |
| 39 | +The subtraction handles the perspective switch automatically! |
| 40 | + |
| 41 | +## Pattern 1: Taking from Ends (Stone Game) |
| 42 | + |
| 43 | +**Scenario**: Array of values, take from either end each turn. |
| 44 | + |
| 45 | +``` |
| 46 | +piles = [3, 9, 1, 2] |
| 47 | +
|
| 48 | +My turn: I can take 3 (left) or 2 (right) |
| 49 | +If I take 3: |
| 50 | + - I gain 3 |
| 51 | + - Opponent faces [9, 1, 2] |
| 52 | + - My total outcome = 3 - (opponent's best from [9, 1, 2]) |
| 53 | +
|
| 54 | +Key insight: dp[i][j] = max score DIFFERENCE for current player |
| 55 | + when piles[i..j] remain |
| 56 | +``` |
| 57 | + |
| 58 | +**Why this works**: |
| 59 | +- State is interval [i, j] of remaining piles |
| 60 | +- Each move shrinks interval by 1 |
| 61 | +- Eventually interval is empty (base case) |
| 62 | + |
| 63 | +## Pattern 2: Taking from Front (Stone Game III) |
| 64 | + |
| 65 | +**Scenario**: Array of values, take 1-3 from front each turn. |
| 66 | + |
| 67 | +``` |
| 68 | +values = [1, 2, 3, -9] |
| 69 | +
|
| 70 | +My turn: I can take 1, 12, or 123 |
| 71 | +If I take first 2 (gaining 3): |
| 72 | + - Opponent faces [3, -9] |
| 73 | + - My outcome = 3 - dp([3, -9]) |
| 74 | +
|
| 75 | +Key insight: dp[start] = max score diff starting from index start |
| 76 | +``` |
| 77 | + |
| 78 | +**Why the -9 matters**: |
| 79 | +- Naive: "Take as much as possible!" |
| 80 | +- Smart: Taking -9 might help if it forces opponent into worse position |
| 81 | +- Negative values make strategy non-trivial |
| 82 | + |
| 83 | +## Pattern 3: Bitmask Games (Can I Win) |
| 84 | + |
| 85 | +**Scenario**: Pool of numbers 1 to n, each used once, first to reach target wins. |
| 86 | + |
| 87 | +``` |
| 88 | +Numbers: {1, 2, 3, 4}, Target: 6 |
| 89 | +
|
| 90 | +If I pick 4: |
| 91 | + - Remaining: {1, 2, 3}, Need: 2 |
| 92 | + - Opponent can pick 2 or 3 to win! |
| 93 | +
|
| 94 | +If I pick 3: |
| 95 | + - Remaining: {1, 2, 4}, Need: 3 |
| 96 | + - Opponent picks any, I might recover... |
| 97 | +``` |
| 98 | + |
| 99 | +**Why bitmask**: |
| 100 | +- State = which numbers are still available |
| 101 | +- With n numbers, 2^n possible states |
| 102 | +- Memoize on the bitmask integer |
| 103 | + |
| 104 | +## The "I Win If Opponent Loses" Pattern |
| 105 | + |
| 106 | +For win/lose games (not score games): |
| 107 | + |
| 108 | +```python |
| 109 | +def can_win(state): |
| 110 | + for move in all_moves: |
| 111 | + if not can_win(state_after_move): |
| 112 | + return True # This move makes opponent lose! |
| 113 | + return False # All moves let opponent win :( |
| 114 | +``` |
| 115 | + |
| 116 | +**Insight**: I need just ONE move that leads to opponent losing. |
| 117 | +Opponent needs ALL my moves to lead to them winning. |
| 118 | + |
| 119 | +## Trace Through: Predict the Winner |
| 120 | + |
| 121 | +``` |
| 122 | +nums = [1, 5, 2] |
| 123 | +
|
| 124 | +Build up from base cases: |
| 125 | +dp[0][0] = 1 (only pile 0, I take it) |
| 126 | +dp[1][1] = 5 (only pile 1, I take it) |
| 127 | +dp[2][2] = 2 (only pile 2, I take it) |
| 128 | +
|
| 129 | +dp[0][1]: piles 0 and 1 remain |
| 130 | + Take left (1): 1 - dp[1][1] = 1 - 5 = -4 |
| 131 | + Take right (5): 5 - dp[0][0] = 5 - 1 = 4 |
| 132 | + → dp[0][1] = max(-4, 4) = 4 (take the 5!) |
| 133 | +
|
| 134 | +dp[1][2]: piles 1 and 2 remain |
| 135 | + Take left (5): 5 - dp[2][2] = 5 - 2 = 3 |
| 136 | + Take right (2): 2 - dp[1][1] = 2 - 5 = -3 |
| 137 | + → dp[1][2] = max(3, -3) = 3 (take the 5!) |
| 138 | +
|
| 139 | +dp[0][2]: all piles remain (THIS IS THE ANSWER) |
| 140 | + Take left (1): 1 - dp[1][2] = 1 - 3 = -2 |
| 141 | + Take right (2): 2 - dp[0][1] = 2 - 4 = -2 |
| 142 | + → dp[0][2] = max(-2, -2) = -2 |
| 143 | +
|
| 144 | +Player 1's advantage = -2 < 0 → Player 2 wins! |
| 145 | +``` |
| 146 | + |
| 147 | +## Common Pitfalls |
| 148 | + |
| 149 | +1. **Forgetting perspective flip**: The subtraction `- dp(next)` is crucial! |
| 150 | + |
| 151 | +2. **Wrong win condition**: |
| 152 | + - `> 0` for strict win |
| 153 | + - `>= 0` when tie counts as win |
| 154 | + |
| 155 | +3. **Not handling edge cases**: |
| 156 | + - Empty array |
| 157 | + - Single element |
| 158 | + - All elements same |
| 159 | + |
| 160 | +4. **Inefficient state**: |
| 161 | + - Using (left, right, whose_turn) when (left, right) suffices |
| 162 | + - The turn is implicit in the recursion |
| 163 | + |
| 164 | +## Complexity Guide |
| 165 | + |
| 166 | +| Game Type | States | Per-State Work | Total | |
| 167 | +|-----------|--------|----------------|-------| |
| 168 | +| Interval [i,j] | O(n²) | O(1) | O(n²) | |
| 169 | +| Linear (start) | O(n) | O(k) | O(nk) | |
| 170 | +| Bitmask | O(2^n) | O(n) | O(n·2^n) | |
| 171 | +| (start, M) | O(n·n) | O(n) | O(n³) | |
| 172 | + |
| 173 | +## The Universal Game Theory DP Template |
| 174 | + |
| 175 | +```python |
| 176 | +from functools import lru_cache |
| 177 | + |
| 178 | +def game_theory_dp(initial_state): |
| 179 | + """ |
| 180 | + Template for two-player optimal games. |
| 181 | +
|
| 182 | + Returns: outcome for first player |
| 183 | + (positive = win, negative = loss, 0 = tie) |
| 184 | + """ |
| 185 | + @lru_cache(maxsize=None) |
| 186 | + def dp(state): |
| 187 | + if is_terminal(state): |
| 188 | + return terminal_value(state) |
| 189 | + |
| 190 | + best = float('-inf') |
| 191 | + for action in possible_actions(state): |
| 192 | + gain = immediate_gain(state, action) |
| 193 | + next_state = apply_action(state, action) |
| 194 | + |
| 195 | + # Key: subtract opponent's best outcome |
| 196 | + value = gain - dp(next_state) |
| 197 | + best = max(best, value) |
| 198 | + |
| 199 | + return best |
| 200 | + |
| 201 | + return dp(initial_state) |
| 202 | +``` |
| 203 | + |
| 204 | +Fill in: |
| 205 | +- `is_terminal`: game over? |
| 206 | +- `terminal_value`: score when game ends |
| 207 | +- `possible_actions`: what moves can I make? |
| 208 | +- `immediate_gain`: points I get from this move |
| 209 | +- `apply_action`: new state after move |
0 commit comments