|
1 | 1 | # Assignment 1: Hopfield Networks |
2 | 2 |
|
3 | 3 | ## Overview |
4 | | -In this assignment, you will explore computational memory models by implementing a Hopfield network. You will first read the primary research article (attached) and then code up the model in a Google Colaboratory notebook. Your implementation will allow you to study various aspects of memory storage and retrieval using a network of binary neurons. |
| 4 | +In this assignment, you will explore computational memory models by implementing a Hopfield network. In the original article ([Hopfield (1982)](https://www.dropbox.com/scl/fi/iw9wtr3xjvrbqtk38obid/Hopf82.pdf?rlkey=x3my329oj9952er68sr28c7xc&dl=1)), neuronal activations were set to either 0 ("not firing") or 1 ("firing"). Modern Hopfield networks nearly always follow an updated implementation, first proposed by [Amit et al. (1985)](https://www.dropbox.com/scl/fi/3a3adwqf70afb9kmieezn/AmitEtal85.pdf?rlkey=78fckvuuvk9t3o9fbpjrmn6de&dl=1). In Amit et al.'s framing, neurons take on activation values of either -1 ("down state") or +1 ("up state"). This has three important benefits over Hopfield's original implementation: |
| 5 | + - It provides a cleaner way to implement the Hebbian learning rule (i.e., without subtracting means or shifting values). |
| 6 | + - It avoids a bias towards 0 (i.e., +1 and -1 are equally "attractive" whereas 0-valued neurons have a stronger "pull"). |
| 7 | + - The energy function (i.e., a description of the attractor dynamics of the network) can be directly mapped onto the [Ising model](https://en.wikipedia.org/wiki/Ising_model) from statistical physics. |
| 8 | + |
| 9 | +You should start by reading [Amit et al. (1985)](https://www.dropbox.com/scl/fi/3a3adwqf70afb9kmieezn/AmitEtal85.pdf?rlkey=78fckvuuvk9t3o9fbpjrmn6de&dl=1) closely. Then you should code up the model in a Google Colaboratory notebook. Unless otherwise noted, all references to "the paper" refer to Amit et al. (1985). |
5 | 10 |
|
6 | 11 | ## Tasks |
7 | 12 |
|
8 | 13 | ### 1. Implement Memory Storage and Retrieval |
9 | 14 | - **Objective:** Write functions that implement the core operations of a Hopfield network: |
10 | | - - **Memory Storage:** Implement the Hebbian learning rule to compute the weight matrix \($W$\). The weight matrix should be computed as: |
11 | | - |
| 15 | + - **Memory Storage:** Implement the Hebbian learning rule to compute the weight matrix, given a set of network configurations (memories). This is described in **Equation 1.5** of the paper: |
| 16 | + |
| 17 | + Let \($p$\) be the number of patterns, \($N$\) the number of neurons, and \($\xi_i^\mu \in \{-1, +1\}$\) the value of neuron \($i$\) in pattern \($\mu$\). |
| 18 | + |
| 19 | + The synaptic coupling between neuron \($i$\) and \($j$\) is: |
| 20 | + |
| 21 | + $$ |
| 22 | + J_{ij} = \frac{1}{N} \sum_{\mu=1}^p \xi_i^\mu \xi_j^\mu. |
| 23 | + $$ |
| 24 | + |
| 25 | + Note that the matrix is symmetric \($J_{ij} = J_{ji}$\), and there are no self-connections allowed (by definition $J_{ii} = 0$). |
| 26 | + |
| 27 | + - **Memory Retrieval:** Implement the retrieval rule using **Equation (1.3)** and surrounding discussion. |
| 28 | + |
| 29 | + At each time step, each neuron updates according to its **local field** \($h_i$\): |
| 30 | + |
12 | 31 | $$ |
13 | | - W_{ij} = \frac{1}{N} \sum_{\mu=1}^{P} \xi_i^\mu \xi_j^\mu |
| 32 | + h_i = \sum_{j=1}^N J_{ij} S_j |
14 | 33 | $$ |
15 | 34 |
|
16 | | - where: |
17 | | - - \($N$\) is the number of neurons, |
18 | | - - \($$\) is the number of stored memories, |
19 | | - - \($\xi^\mu$\) represents a stored binary memory pattern (\($\pm1$\)), |
20 | | - - \($W_{ii} = 0$\) (no self-connections). |
21 | | - |
22 | | - - **Memory Retrieval:** Implement the asynchronous update rule: |
| 35 | + The neuron updates its state to align with the sign of the field: |
23 | 36 |
|
24 | 37 | $$ |
25 | | - V_i^{t+1} = \text{sign} \left( \sum_{j} W_{ij} V_j^t \right) |
| 38 | + S_i(t+1) = \text{sign}(h_i(t)) = \text{sign} \left( \sum_{j} J_{ij} S_j(t) \right) |
26 | 39 | $$ |
27 | 40 |
|
28 | | - where: |
29 | | - - \($V^t$\) is the current state of the network, |
30 | | - - The sign function ensures binary activation (\($\pm1$\)). |
| 41 | + Note that \($S_i \in \{-1, +1\}$\) represents the state of neuron \($i$\). |
| 42 | + |
| 43 | +### 2. Test with a Small Network |
31 | 44 |
|
32 | | -### 2. Test with a Small Network (5% of Grade) |
33 | | -- **Objective:** Encode the following test memories in a small Hopfield network with \( N = 5 \) neurons: |
| 45 | +Encode the following test memories in a small Hopfield network with \( N = 5 \) neurons: |
34 | 46 |
|
35 | 47 | $$ |
36 | | - \xi^1 = [1, -1, 1, -1, 1] |
| 48 | + \xi^1 = [+1, -1, +1, -1, 1] |
37 | 49 | $$ |
38 | 50 | $$ |
39 | | - \xi^2 = [1, 1, -1, 1, -1] |
| 51 | + \xi^2 = [-1, +1, -1, +1, -1] |
40 | 52 | $$ |
41 | 53 |
|
42 | 54 | - Store these memories using the Hebbian rule. |
43 | | - - Test memory retrieval by presenting the network with noisy versions of the stored patterns (e.g., flipping one bit). |
44 | | - - Discuss the results and provide insights into how the network behaves. |
| 55 | + - Test memory retrieval by presenting the network with noisy versions of the stored patterns (e.g., flipping one neuron's sign, setting one or more activation values to 0). |
| 56 | + - Briefly discuss the results and provide insights into how the network behaves. You might find Figure 3 from [Hopfield (1984)](https://www.dropbox.com/scl/fi/7wktieqztt60b8wyhg2au/Hopf84.pdf?rlkey=yi3baegby8x6olxznsvm8lyxz&dl=1) useful! You can either write a paragraph or two, or sketch (or code up) a diagram or figure, or some combination. In particular: |
| 57 | + - Can you get a sense of how the network "works"-- i.e., why it stores memories? |
| 58 | + - Why do some memories interfere and others don't? Can you build up enough of an intuition that you can manually construct memories that can vs. can't be retrieved by a toy (small) network? |
| 59 | + - Can you build up any intuitions about what sorts of factors might affect the "capacity" of the network? (Capacity is the maximum number of memories that can be "successfully" retrieved). |
45 | 60 |
|
46 | 61 | ### 3. Evaluate Storage Capacity |
47 | 62 | - **Objective:** Determine how the ability to recover memories degrades as you vary: |
48 | 63 | - **Network Size:** The total number of neurons in the network. |
49 | 64 | - **Number of Stored Memories:** The number of patterns stored in the network. |
| 65 | + |
| 66 | + To generate $m$ memories \($\xi_1, ... \xi_m$\) for a network of $N$ neurons, you can use the following Python code; each row of the resulting matrix `xi` contains a single memory: |
| 67 | + ```python |
| 68 | + |
| 69 | + import numpy as np |
| 70 | + xi = 2 * (np.random.rand(m, N) > 0.5) - 1 |
| 71 | + ``` |
50 | 72 |
|
51 | 73 | - **Method:** |
52 | 74 | - For each configuration, run multiple trials to compute the proportion of times that at least 99% of a memory is accurately recovered. |
53 | | - - **Visualization:** Create a heatmap where one axis represents the network size and the other represents the number of stored memories, with the color indicating the recovery accuracy. |
| 75 | + - **Visualization 1:** Create a heatmap where the $x$-axis represents the network size, the $y$-axis represents the number of stored memories, and the color indicates the recovery accuracy. Play around with this to decide on a range of network sizes and numbers of memories that adequately illustrates the system's behavior. |
| 76 | + - **Visualization 2:** For each network size ($x$-axis), plot the expected number of memories that can be retrieved accurately ($y$-axis). Let: |
| 77 | + |
| 78 | + - \($N$\): the number of neurons (network size) |
| 79 | + - \($m$\): the number of stored memories |
| 80 | + - \($P\left[m, N\right] \in \left[0, 1\right]$\): the empirically observed success rate (from your heatmap); i.e., the **proportion** of memories correctly retrieved with at least 99% accuracy, for a network of size \($N$\) and \($m$\) stored memories. |
| 81 | + |
| 82 | + Then the **expected number of successfully retrieved memories**, \($\mathbb{E}\left[R_N\right]$\), for each network size \($N$\) is given by: |
| 83 | + |
| 84 | + $$ |
| 85 | + \mathbb{E}\left[R_N\right] = \sum_{m=1}^{M} m \cdot P[m, N], |
| 86 | + $$ |
| 87 | + where \($M$\) is the maximum number of stored memories you tested. |
| 88 | + |
| 89 | + - Is there any systematic relationship (between network size and capacity) that emerges? Can you describe any intuitions and/or develop any "rules" that might enable you to estimate a network's capacity solely from its size? |
54 | 90 |
|
55 | 91 | ### 4. Simulate Cued Recall |
56 | | -- **Objective:** Explore how the network handles paired associations: |
57 | | - - **Setup:** Divide the neuron features into two halves, where the first half represents the "cue" (A) and the second half represents the "response" (B) in an A-B pair. |
58 | | - - **Task:** For each trial, present the network with a cue and measure the proportion of times the corresponding response is recovered with at least 99% accuracy. |
59 | | - - **Visualization:** Generate a heatmap that shows the accuracy of cued recall as a function of network size and the number of stored memories. |
| 92 | +**Objective:** Evaluate how well the network performs associative recall when presented with only a partial input (a cue), and must recover the corresponding response. |
| 93 | + |
| 94 | +#### Setup: A–B Pair Structure |
| 95 | + |
| 96 | +- Each stored memory is a concatenated pair of binary patterns: |
| 97 | + - The first half of the neurons represents the **cue** \($A$\) |
| 98 | + - The second half represents the **response** \($B$\) |
| 99 | + |
| 100 | +- If the total number of neurons \($N$\) is odd: |
| 101 | + - Let the cue occupy the first $\lfloor N/2 \rfloor$ neurons |
| 102 | + - Let the response occupy the remaining $\lceil N/2 \rceil$ neurons |
| 103 | + |
| 104 | +Each full pattern \($\xi^\mu \in \{-1, +1\}^N$\) is defined as: |
| 105 | +$$ |
| 106 | +\xi^\mu = \begin{bmatrix} A^\mu \\ B^\mu \end{bmatrix} |
| 107 | +$$ |
| 108 | + |
| 109 | +#### Simulation Procedure |
| 110 | + |
| 111 | +For each trial: |
| 112 | + |
| 113 | + - **Choose a stored memory** \( \xi^\mu \) |
| 114 | + |
| 115 | + - **Construct the initial network state \( x \)**: |
| 116 | + - Set the cue half to match the stored pattern: |
| 117 | + $$ |
| 118 | + x_i = A^\mu_i \quad \text{for } i = 1, \dots, \lfloor N/2 \rfloor |
| 119 | + $$ |
| 120 | + - Set the response half to zero (i.e., no initial information): |
| 121 | + $$ |
| 122 | + x_i = 0 \quad \text{for } i = \lfloor N/2 \rfloor + 1, \dots, N |
| 123 | + $$ |
| 124 | + |
| 125 | + - **Evolve the network** until it reaches a stable state using the usual update rule: |
| 126 | + $$ |
| 127 | + x_i \leftarrow \text{sign} \left( \sum_j J_{ij} x_j \right) |
| 128 | + $$ |
| 129 | + |
| 130 | + You may choose whether to: |
| 131 | + - Let the cue neurons update along with the rest of the network |
| 132 | + - Or **clamp** the cue (i.e., keep \($x_i = A^\mu_i$\) fixed for the cue indices) |
| 133 | + |
| 134 | + - **Evaluate accuracy**: |
| 135 | + - Extract the response portion from the final state \($x^*$\) |
| 136 | + - Compare it to the original response \($B^\mu$\) |
| 137 | + - Mark as a **successful recall** if at least 99% of the bits match: |
| 138 | + $$ |
| 139 | + \frac{1}{|B|} \sum_{i \in \text{response}} \mathbb{1}[x^*_i = B^\mu_i] \geq 0.99 |
| 140 | + $$ |
| 141 | + |
| 142 | +#### Analysis |
| 143 | + |
| 144 | +- Repeat the simulation for multiple stored $A$–$B$ pairs |
| 145 | +- For each network size \($N$\), compute the **expected number of correctly recalled responses** |
| 146 | +- Plot this value as a function of \($N$\) |
| 147 | + |
| 148 | + |
| 149 | +#### Optional Extensions |
| 150 | + |
| 151 | +- Compare performance with and without clamping the cue neurons |
| 152 | +- Test whether cueing with partial or noisy \($A$\) patterns still leads to correct retrieval of \($B$\) |
60 | 153 |
|
61 | 154 | ### 5. Simulate Contextual Drift |
62 | | -- **Objective:** Investigate how context perturbations affect memory retrieval: |
63 | | - - **Setup:** Use a network with 100 neurons and encode a sequence of 10 memories. In this simulation: |
64 | | - - Half of the neuron features represent the item. |
65 | | - - The other half represent the "context" (initialized randomly, with a small proportion of features perturbed each time a new memory is stored). |
66 | | - - **Task:** When the network is cued with the context of memory *i*, determine the probability of retrieving any memory as a function of its position relative to *i*. |
67 | | - - **Visualization:** Create a plot with 95% confidence interval error bars showing the retrieval probability versus the relative position in the sequence. |
68 | | - |
69 | | -## Grading Criteria |
70 | | -- **Accuracy of Implementation (45%)** |
71 | | - - Correct implementation of the storage/retrieval functions and simulation tasks. |
72 | | -- **Small Network Example (5%)** |
73 | | - - Accurate construction of the weight matrix and correct retrieval of stored memories. |
74 | | -- **Code Quality (25%)** |
75 | | - - Clear, well-documented, and efficient code. |
76 | | -- **Correctness of Results (25%)** |
77 | | - - **10% each:** For the two heatmaps (storage capacity and cued recall). |
78 | | - - **5%:** For the contextual drift simulation plot. |
| 155 | + |
| 156 | +**Objective:** Explore how gradual changes in context influence which memories are retrieved. This models how temporal or environmental drift might bias recall toward memories with similar contexts. |
| 157 | + |
| 158 | +#### Setup: Item–Context Memory Representation |
| 159 | + |
| 160 | +- Use a Hopfield network with **100 neurons**. |
| 161 | +- Each memory is a combination of: |
| 162 | + - **Item features**: 50 neurons (first half) |
| 163 | + - **Context features**: 50 neurons (second half) |
| 164 | + |
| 165 | +- Create a **sequence of 10 memories** \($\{\xi^1, \xi^2, \dots, \xi^{10}\}$\), each composed of: |
| 166 | + $$ |
| 167 | + \xi^t = \begin{bmatrix} \text{item}^t \\ \text{context}^t \end{bmatrix} |
| 168 | + $$ |
| 169 | + |
| 170 | +- Initialize the **context vector** for the first memory randomly (set it to a random vector of +1s and -1s). |
| 171 | +- For each subsequent memory \($t + 1$\), create a new context vector by: |
| 172 | + - **Copying** the previous context |
| 173 | + - **Perturbing** a small number of bits (e.g., 5% of context features flipped) |
| 174 | + |
| 175 | +This creates a **drifting context** across the sequence. |
| 176 | + |
| 177 | +#### Simulation Procedure |
| 178 | + |
| 179 | +1. **Train the network** on all 10 memory patterns using Hebbian learning (encode them into a weight matrix, $J$) |
| 180 | + |
| 181 | +2. For each memory index \($i = 1, \dots, 10$\): |
| 182 | + |
| 183 | + a. **Cue the network** with the **context vector** from \($\xi^i$\): |
| 184 | + - Set the **context neurons** (second half) to match \($\text{context}^i$\) |
| 185 | + - Set the **item neurons** (first half) to zero (i.e., no initial item input) |
| 186 | + |
| 187 | + b. **Run the network dynamics** until convergence |
| 188 | + |
| 189 | + c. **Compare the final state** to all 10 stored patterns: |
| 190 | + - For each stored pattern \($\xi^j$\), extract the item portion and compare it to the final item state |
| 191 | + - If the item portion of \($\xi^j$\) matches the recovered state with ≥99% accuracy, consider memory \( j \) to have been retrieved |
| 192 | + |
| 193 | + d. Record the **retrieved index** (if any), and compute the **relative position**: |
| 194 | + $$ |
| 195 | + \Delta = j - i |
| 196 | + $$ |
| 197 | + (e.g., \($\Delta = 0$\) means the correct memory was retrieved; \($\Delta = 1$\) means the next one in the sequence was recalled, etc.) |
| 198 | + |
| 199 | +#### Analysis |
| 200 | + |
| 201 | +- Repeat the simulation multiple times (e.g., 100 runs) to account for randomness |
| 202 | +- For each relative position \( \Delta \in [-9, +9] \), compute: |
| 203 | + - The **probability** that a memory at offset \($\Delta$\) was retrieved when cueing from index \($i$\) |
| 204 | + - A **95% confidence interval** for each probability estimate |
| 205 | + |
| 206 | +#### Visualization |
| 207 | + |
| 208 | +Create a line plot where: |
| 209 | + |
| 210 | +- $x$-axis: Relative position in the sequence \($\Delta$\) |
| 211 | +- $y$-axis: Probability of retrieval |
| 212 | +- Error bars: 95% confidence intervals |
| 213 | + |
| 214 | +Write up a brief description of what you think is happening (and why). |
| 215 | + |
| 216 | +#### Optional extensions |
| 217 | + |
| 218 | +- Context drift simulates how memory might change over time or under shifting external conditions |
| 219 | +- You may adjust the context perturbation rate to see how sharply it affects retrieval |
| 220 | +- This model can be adapted to explore recency effects, intrusion errors, or generalization |
| 221 | + |
79 | 222 |
|
80 | 223 | ## Submission Instructions |
81 | | -- Submit a single Google Colaboratory notebook that includes: |
82 | | - - Your full Python implementation. |
83 | | - - Markdown cells explaining your approach, methodology, and any design decisions. |
| 224 | +- [Submit](https://canvas.dartmouth.edu/courses/71051/assignments/517353) a single stand-alone Google Colaboratory notebook (or similar) that includes: |
| 225 | + - Your full model implementation. |
| 226 | + - Markdown (text) cells explaining your approach, methodology, any design decisions you want to draw attention to, and discussion points. |
84 | 227 | - Plots and results for each simulation task. |
85 | | -- Ensure that your notebook runs without errors in Google Colaboratory. |
86 | | - |
87 | | -Good luck, and be sure to thoroughly test your implementation! |
| 228 | +- Ensure that your notebook runs without errors in Google Colaboratory. |
0 commit comments