Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/images/museum/rat_accuracy.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/museum/rat_rewards.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/museum/ratmaze.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/museum/real_ratmaze.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/tutorials/neurocog/alphasyn.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/tutorials/neurocog/exp2syn.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/tutorials/neurocog/expsyn.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 28 additions & 0 deletions docs/modeling/synapses.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,34 @@ This synapse performs a deconvolutional transform of its input signals. Note tha

## Dynamic Synapse Types

### Exponential Synapse

This (chemical) synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that its efficacies are a function of their pre-synaptic inputs; there is no inherent form of long-term plasticity in this base implementation. Synaptic strength values can be viewed as being filtered/smoothened through an expoential kernel.

```{eval-rst}
.. autoclass:: ngclearn.components.ExponentialSynapse
:noindex:

.. automethod:: advance_state
:noindex:
.. automethod:: reset
:noindex:
```

### Alpha Synapse

This (chemical) synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that its efficacies are a function of their pre-synaptic inputs; there is no inherent form of long-term plasticity in this base implementation. Synaptic strength values can be viewed as being filtered/smoothened through a kernel that models more realistic rise and fall times of synaptic conductance..

```{eval-rst}
.. autoclass:: ngclearn.components.AlphaSynapse
:noindex:

.. automethod:: advance_state
:noindex:
.. automethod:: reset
:noindex:
```

### Short-Term Plasticity (Dense) Synapse

This synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that it engages in short-term plasticity (STP), meaning that its efficacy values change as a function of its inputs/time (and simulated consumed resources), but it does not provide any long-term form of plasticity/adjustment.
Expand Down
99 changes: 98 additions & 1 deletion docs/museum/rl_snn.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,115 @@ exhibit can be found

## Modeling Operant Conditioning through Modulation

Operant conditioning refers to the idea that there are environmental stimuli that can either increase or decrease the occurrence of (voluntary) behaviors; in other words, positive stimuli can lead to future repeats of a certain behavior whereas negative stimuli can lead to (i.e., punish) a decrease in future occurrences. Ultimately, operant conditioning, through consequences, shapes voluntary behavior where actions followed by rewards are repeated and actions followed by punished/negative outcomes diminish.

In this lesson, we will model very simple case of operant conditioning for a neuronal motor circuit used to engage in the navigation of a simple maze.
The maze's design will be the rat T-maze and the "rat" will be allowed to move, at a particular point in the maze, in one of four directions (up/North, down/South, left/West, and right/East). A positive reward will be supplied to our rat neuronal circuit if it makes progress towards the direction of food (placed in the upper right corner of the T-maze) and a negative reward will be provided if fails to make progress/gets stuck, i.e., a dense reward functional will be employed. For the exhibit code that goes with this lesson, an implementation of this T-maze environment is provided, modeled in the same style/with the same agent API as the OpenAI gymnasium.

### Reward-Modulated Spike-Timing-Dependent Plasticity (R-STDP)

Although [spike-timing-dependent plasticity](../tutorials/neurocog/stdp.md) (STDP) and [reward-modulated STDP](../tutorials/neurocog/mod_stdp.md) (MSTDP) are covered and analyzed in detail in the ngc-learn set of tutorials, we will briefly review the evolution
of synaptic strengths as prescribed by modulated STDP with eligibiility traces here. In effect, STDP prescribes changes
in synaptic strength according to the idea that <i>neurons that fire together, wire together, except that timing matters</i>
(a temporal interpretation of basic Hebbian learning). This means that, assuming we are able to record the times of
pre-synaptic and post-synaptic neurons (that a synaptic cable connects), we can, at any time-step $t$, produce an
adjustment $\Delta W_{ij}(t)$ to a synapse via the following pair of correlational rules:

$$
\Delta W_{ij}(t) = A^+ \big(x_i s_j \big) - A^- \big(s_i x_j \big)
$$

where $s_j$ is the spike recorded at time $t$ of the post-synaptic neuron $j$ (and $x_j$ is an exponentially-decaying trace that tracks its spiking history) and $s_i$ is the spike recorded at time $t$ of the pre-synaptic neuron $i$ (and $x_i$ is an exponentially-decaying trace that tracks its pulse history). STDP, as shown in a very simple format above, effectively can be described as balancing two types of alterations to a synaptic efficacy -- long-term potentiation (the first term, which increases synaptic strength) and long-term depression (the second term, which decreases synaptic strength).

Modulated STDP is a three-factor variant of STDP that multiplies the final synaptic update by a third signal, e.g., the modulatory signal is often a reward (dopamine) intensity value (resulting in reward-modulated STDP). However, given that reward signals might be delayed or not arrive/be available at every single time-step, it is common practice to extend a synapse to maintain a second value called an "eligibility trace", which is effectively another exponentially-decaying trace/filter (instantiated as an ODE that can be integrated via the Euler method or related tools) that is constructed to track a sequence of STDP updates applied across a window of time. Once a reward/modulator signal becomes available, the current trace is multiplied by the modulator to produce a change in synaptic efficacy.
In essence, this update becomes:

$$
\Delta W_{ij} = \nu E_{ij}(t) r(t), \; \text{where } \; \tau_e \frac{\partial E_{ij}(t)}{\partial t} = -E_{ij}(t) + \Delta W_{ij}(t)
$$

where $r(t)$ is the dopamine supplied at some time $t$ and $\nu$ is some non-negative global learning rate. Note that MSTDP with eligibility traces (MSTDP-ET) is agnostic to the choice of local STDP/Hebbian update used to produce $\Delta W_{ij}(t)$ (for example, one could replace the trace-based STDP rule we presented above with BCM or a variant of weight-dependent STDP).

## The Spiking Neural Circuit Model

In this exhibit, we build one of the simplest possible spiking neural networks (SNNs) one could design to tackle a simple maze navigation problem such as the rat T-maze; specifically, a three-layer SNN where the first layer is a Poisson encoder and the second and third layers contain sets of recurrent leaky integrate-and-fire (LIF) neurons. The recurrence in our model is to be non-plastic and constructed such that a form of lateral competition is induced among the LIF units, i.e., LIF neurons will be driven by a scaled Hollow-matrix initialized recurrent weight matrix (which will multiply spikes encountered at time $t - \Delta t$ by negative values), which will (quickly yet roughly) approximate the effect of inhibitory neurons. For the synapses that transmit pulses from the sensory/input layer to the second layer, we will opt for a non-plastic sparse mixture of excitatory and inhibitory strength values (much as in the model of <b>[1]</b>) to produce a reasonable encoding of the input Poisson spike trains. For the synapses that transmit pulses from the second layer to the third (control/action) layer, we will employ MSTDP-ET (as shown in the previous section) to adjust the non-negative efficacies in order to learn a basic reactive policy. We will call this very simple neuronal model the "reinforcement learning SNN" (RL-SNN).

The SNN circuit will be provided raw pixels of the T-maze environment (however, this view is a global view of the
entire maze, as opposed to something more realistic such as an egocentric view of the sensory space), where a cross
"+" marks its current location and an "X" marks the location of the food substance/goal state. Shown below is an
image to the left depicting a real-world rat T-maze whereas to the right is our implementation/simulation of the
T-maze problem (and what our SNN circuit sees at the very start of an episode of the navigation problem).

### Neuronal Dynamics
```{eval-rst}
.. table::
:align: center

+-------------------------------------------------+------------------------------------------------+
| .. image:: ../images/museum/real_ratmaze.jpg | .. image:: ../images/museum/ratmaze.png |
| :width: 250px | :width: 200px |
| :align: center | :align: center |
+-------------------------------------------------+------------------------------------------------+
```

## Running the RL-SNN Model

To fit the RL-SNN model described above, go to the `exhibits/rl_snn`
sub-folder (this step assumes that you have git cloned the model museum repo
code), and execute the RL-SNN's simulation script from the command line as follows:

```console
$ ./sim.sh
```
which will execute a simulation of the MSTDP-adapted SNN on the T-maze problem, specifically executing four uniquely-seeded trial runs (i.e., four different "rat agents") and produce two plots, one containing a smoothened curve of episodic rewards over time and another containing a smoothened task accuracy curve (as in, did the rat reach the goal-state and obtain the food substance or not). You should obtain plots that look roughly like the two below.

```{eval-rst}
.. table::
:align: center

+-----------------------------------------------+-----------------------------------------------+
| .. image:: ../images/museum/rat_rewards.jpg | .. image:: ../images/museum/rat_accuracy.jpg |
| :width: 400px | :width: 400px |
| :align: center | :align: center |
+-----------------------------------------------+-----------------------------------------------+
```

Notice that we have provided a random agent baseline (i.e., uniform random selection of one of the four possible
directions to move at each step in an episode) to contrast the SNN rat motor circuit's performance with. As you may
observe, the SNN circuit ultimately becomes conditioned to taking actions akin to the optimal policy -- go North/up
if it perceives itself (marked as a cross "+") at the bottom of the T-maze and then go East/right once it has reached the top
of the T of the T-maze and go right upon perception of the food item (goal state marked as an "X").

The code has been configured to also produce a small video/GIF of the final episode `episode200.gif`, where the MSTDP
weight changes have been disabled and the agent must solely rely on its memory of the uncovered policy to get to the
goal state.

### Some Important Limitations

While the above MSTDP-ET-driven motor circuit model is useful and provides a simple model of operant conditioning in
the context of a very simple maze navigation task, it is important to identify the assumptions/limitations of the
above setup. Some important limitations or simplifications that have been made to obtain a consistently working
RL-SNN model:
1. As mentioned earlier, the sensory input contains a global view of the maze navigation problem, i.e., a 2D birds-eye
view of the agent, its goal (the food substance), and its environment. More realistic, but far more difficult
versions of this problem would need to consider an ego-centric view (making the problem a partially observable
Markov decision process), a more realistic 3D representation of the environment, as well as more complex maze
sizes and shapes for the agent/rat model to navigate.
2. The reward is only delayed with respect to the agent's stimulus processing window, meaning that the agent essentially
receives a dopamine signal after an action is taken. If we ignore the SNN's stimulus processing time between video
frames of the actual navigation problem, we can view our agent above as tackling what is known in reinforcement
learning as the dense reward problem. A far more complex, yet more cognitively realistic, version of the problem
is to administer a sparse reward, i.e., the rat motor circuit only receives a useful dopamine/reward stimulus at the
end of an episode as opposed to after each action. The above MSTDP-ET model would struggle to solve the sparse
reward problem and more sophisticated models would be required in order to achieve successful outcomes, i.e.,
appealing to models of memory/cognitive maps, more intelligent forms of exploration, etc.
3. The SNN circuit itself only permits plastic synapses in its control layer (i.e., the synaptic connections between
the second layer and third output/control layer). The bottom layer is non-plastic and fixed, meaning that the
agent model is dependent on the quality of the random initialization of the input-to-hidden encoding layer. The
input-to-hidden synapses could be adapted with STDP (or MSTDP); however, the agent will not always successfully
and stably converge to a consistent policy as the encoding layer's effectiveness is highly dependent on how much
of the environment the agent initially sees/explores (if the agent gets "stuck" at any point, STDP will tend to
fill up the bottom layer receptive fields with redundant information and make it more difficult for the control
layer to learn the consequences of taking different actions).

<!-- References/Citations -->
## References
Expand Down
8 changes: 0 additions & 8 deletions docs/source/ngclearn.components.neurons.graded.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,6 @@ ngclearn.components.neurons.graded.rateCell module
:undoc-members:
:show-inheritance:

ngclearn.components.neurons.graded.rateCellOld module
-----------------------------------------------------

.. automodule:: ngclearn.components.neurons.graded.rateCellOld
:members:
:undoc-members:
:show-inheritance:

ngclearn.components.neurons.graded.rewardErrorCell module
---------------------------------------------------------

Expand Down
8 changes: 0 additions & 8 deletions docs/source/ngclearn.components.synapses.hebbian.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,6 @@ ngclearn.components.synapses.hebbian.hebbianSynapse module
:undoc-members:
:show-inheritance:

ngclearn.components.synapses.hebbian.hebbianSynapseOld module
-------------------------------------------------------------

.. automodule:: ngclearn.components.synapses.hebbian.hebbianSynapseOld
:members:
:undoc-members:
:show-inheritance:

ngclearn.components.synapses.hebbian.traceSTDPSynapse module
------------------------------------------------------------

Expand Down
24 changes: 24 additions & 0 deletions docs/source/ngclearn.components.synapses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@ ngclearn.components.synapses.STPDenseSynapse module
:undoc-members:
:show-inheritance:

ngclearn.components.synapses.alphaSynapse module
------------------------------------------------

.. automodule:: ngclearn.components.synapses.alphaSynapse
:members:
:undoc-members:
:show-inheritance:

ngclearn.components.synapses.denseSynapse module
------------------------------------------------

Expand All @@ -31,6 +39,22 @@ ngclearn.components.synapses.denseSynapse module
:undoc-members:
:show-inheritance:

ngclearn.components.synapses.doubleExpSynapse module
----------------------------------------------------

.. automodule:: ngclearn.components.synapses.doubleExpSynapse
:members:
:undoc-members:
:show-inheritance:

ngclearn.components.synapses.exponentialSynapse module
------------------------------------------------------

.. automodule:: ngclearn.components.synapses.exponentialSynapse
:members:
:undoc-members:
:show-inheritance:

ngclearn.components.synapses.staticSynapse module
-------------------------------------------------

Expand Down
Loading
Loading