Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
> We here organize these papers in the following categories. But some of them might have overlap.

## (1). Uncertainty in deep learning
> Model uncertainty in deep learning via Bayesian modelling by variatial inference etc.
> Model uncertainty in deep learning via Bayesian modelling by variational inference etc.

- [1705]. Concrete Dropout - [[arxiv](https://arxiv.org/abs/1705.07832)] [[Note](/notes/concrete-dropout.md)]
- [1703]. Dropout Inference in Bayesian Neural Networks with Alpha-divergences - [[arxiv](https://arxiv.org/abs/1703.02914)] [[Note](/notes/alpha-divergence.md)]
- [1703]. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? - [[arxiv](https://arxiv.org/abs/1703.04977)] [[Note](/notes/)]
- [2016]. Uncertainty in Deep Learning - [[PDF](https://pdfs.semanticscholar.org/a6af/62389c6655770c624e2fa3f3ad6dc26bf77e.pdf)] [[Blog](http://mlg.eng.cam.ac.uk/yarin/blog_2248.html)] [[Note](/notes/uncertainty-deep-learning.md)]
- [1505]. Weight Uncertainty in Neural Networks - [[arxiv](https://arxiv.org/abs/1505.05424)] [[Note](/notes/bbb.md)]
- [2015]. On Modern Deep Learning and Variational Inference - [[NIPS](http://www.approximateinference.org/accepted/GalGhahramani2015.pdf)] [[Note](/notes/modern-vi.md)]
- [1995]. Bayesian learning for neural networks
- [1995]. Bayesian learning for neural networks - [[PDF](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf)]

## (2). Probabilistic deep models
> Use probabilistic model to imitate deep neural networks.
Expand Down Expand Up @@ -56,7 +56,7 @@
- [1611]. Categorical Reparameterization with Gumbel-Softmax - [[arxiv](https://arxiv.org/abs/1611.01144)]

## (6) Bayesian neural network pruning
> Sparse prior can be used to induce sparse weight or neuron in neural networks thus favor smaller network structure for mobile devices etc.
> Sparse prior can be used to induce sparse weight or neuron in neural networks, thus favor smaller network structure for mobile devices etc.

- [1711]. Interpreting Convolutional Neural Networks Through Compression - [[arXiv](https://arxiv.org/abs/1711.02329)] [[Note](/notes/interpret-cnn-compress.md)]
- [1705]. Structural compression of convolutional neural networks based on greedy filter pruning - [[arXiv](https://arxiv.org/abs/1705.07356)] [[Note](/notes/interpret-cnn-compress.md)]
Expand All @@ -70,3 +70,4 @@ Any contribution is welcome. But notice that we need '*one phrase summary*' to g

## Contributors
- [Jun Lu](https://github.com/junlulocky)
- [Christine Chai](https://github.com/star1327p)
4 changes: 2 additions & 2 deletions notes/perturbative-vi.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## [Perturbative Black Box Variational Inference](https://arxiv.org/abs/1709.07433)]

The drawback of KL divergence is that: suppose the *q(w|.)* is the variational distribution and *p(w|.)* is the posterior distribution we want to use. The KL divergence will penalise *q(w)* for placing mass where *p(w|.)* has no or small mass and penalise less for not placing mass where *p(w|.)* has large mass[See another note](/notes/alpha-divergence.md). The authors constructed a new variational bound which is tighter than the KL bound and **more mass covering**. Compared to alpha-divergences, its reparameterization gradients have a lower variance. In short, the authors chose a lower bound lies in the general version of evidence lower bound (ELBO) - f-ELBO, that is a biased estimator with smaller variance which induces careful bias-variance trade-off.
The drawback of KL divergence is that: suppose the *q(w|.)* is the variational distribution and *p(w|.)* is the posterior distribution we want to use. The KL divergence will penalise *q(w)* for placing mass where *p(w|.)* has no or small mass and penalise less for not placing mass where *p(w|.)* has large mass [[See another note](/notes/alpha-divergence.md)]. The authors constructed a new variational bound which is tighter than the KL bound and **more mass covering**. Compared to alpha-divergences, its reparameterization gradients have a lower variance. In short, the authors chose a lower bound lies in the general version of evidence lower bound (ELBO) - f-ELBO, that is a biased estimator with smaller variance which induces careful bias-variance trade-off.

Note: it also contains a good review how ELBO can be derived from the marginal distribution of data.
Note: it also contains a good review how ELBO can be derived from the marginal distribution of data.
2 changes: 1 addition & 1 deletion notes/smooth-svi.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
## [Smoothed Gradients for Stochastic Variational Inference](http://papers.nips.cc/paper/5557-smoothed-gradients-for-stochastic-variational-inference.pdf)

stochastic variation inference uses a weighted sum to update the parameter which is unbiased. In smoothed gradients for stochastic variation inference, they uses a window averaged to update the parameter which is biased estimator but reduces the variance so as to fasten the convergence.
Stochastic variation inference uses a weighted sum to update the parameter which is unbiased. In smoothed gradients for stochastic variation inference, they use a window averaged to update the parameter which is biased estimator but reduces the variance so as to expedite the convergence.
2 changes: 1 addition & 1 deletion notes/stein-var.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
## [Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm](https://arxiv.org/abs/1608.04471)

The problem of variational inference is that the variational distribution is usually over-simplified and it maybe very different to the posterior distribution of interest. Stein variational gradient descent favors the stein's identity and thus using a iterative methods to make the 'variational distribution' closer to the posterior distribution of interest.
The problem of variational inference is that the variational distribution is usually over-simplified and it may be very different than the posterior distribution of interest. Stein variational gradient descent favors the Stein's identity and thus using a iterative methods to make the 'variational distribution' closer to the posterior distribution of interest.