This repository contains the code to train restricted Boltzmann machines (RBMs) on data generated from Ising models. The Ising data is generated using Magneto (https://github.com/s9w/magneto). The RBM model implementation is based on that used by the authors of (https://arxiv.org/abs/1810.11503). I implemented (persistent) contrastive divergence and parallel tempering in PyTorch, based on the PyDeep implementation of RBM samplers (https://pydeep.readthedocs.io/en/latest/_modules/pydeep/rbm/sampler.html). I generated 25k samples from an
./magneto/magneto.exe -L=8 -TMin=1.8 -TMax=1.8 -TSteps=1 -N1=10000 -N2=25000 -N3=100 -states=testStates -record=main
I then trained 10 RBMs on this data (all with 64 visible and hidden nodes), using either standard contrastive divergence, or parallel tempering with
python rbm_train_PT.py --json inputs_example.json
The mean and standard deviation of the log-likelihood across these 10 machines, along the first 500 training epochs, are shown here (plot taken from the notebook in this repo):
It can be seen that switching from contrastive divergence (CD) to PCD (i.e. K=1) already improved the convergence on average, but gradients were noisy. Increasing
RBMs are traditionally trained with contrastive divergence, where the gradients are based on how training examples are propagated through the hidden layer. Contrastive divergence training suffers from the fact that the Markov chains converge to the true distribution more slowly as the network weights grow, making training unstable (Fischer, 2010). To stabilise training, I explored a technique called parallel tempering. One problem with contrastive divergence is that the gradients are only calculated closely around training examples. To get around this, one could increase
Parallel tempering is a technique designed to explore a larger area of the model space of the RBM. It does so by running multiple Markov chains at the same time, all running at different temperatures. The distribution over an RBM
is a Boltzmann weight at unit temperature. Consider a family of
Note that
where