A Python implementation of the Bayesian Flow Networks paper.
The motivation for the paper is that autoregressive models are inefficient,
because generating a sequence of
Unfortunately, diffusion models do not work well for discrete data. So, instead of denoising a data point, one models each element of the output sequence as being generated by a probability distribution and iteratively refines these distributions. The parameters of these distributions can be continuous.
Initially, each element's probability distribution is modelled as the maximum entropy distribution (the uniform distribution for discrete data and the normal distribution for continuous data). At each iteration, the network updates the parameters for each element's distribution, to bring it closer to the joint probability distribution of the training set.
If each element were independent, training this model would be easy: sample a
sequence
Unfortunately, the elements are not independent, and so the relationship between them must also be modelled. This is done by training a neural network which, at each iteration in the generative process, modifies the parameters of each element's probability distribution to account for the other elements.