diff --git a/README.md b/README.md index 83b6b73..7510eb0 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ > We here organize these papers in the following categories. But some of them might have overlap. ## (1). Uncertainty in deep learning -> Model uncertainty in deep learning via Bayesian modelling by variatial inference etc. +> Model uncertainty in deep learning via Bayesian modelling by variational inference etc. - [1705]. Concrete Dropout - [[arxiv](https://arxiv.org/abs/1705.07832)] [[Note](/notes/concrete-dropout.md)] - [1703]. Dropout Inference in Bayesian Neural Networks with Alpha-divergences - [[arxiv](https://arxiv.org/abs/1703.02914)] [[Note](/notes/alpha-divergence.md)] @@ -11,7 +11,7 @@ - [2016]. Uncertainty in Deep Learning - [[PDF](https://pdfs.semanticscholar.org/a6af/62389c6655770c624e2fa3f3ad6dc26bf77e.pdf)] [[Blog](http://mlg.eng.cam.ac.uk/yarin/blog_2248.html)] [[Note](/notes/uncertainty-deep-learning.md)] - [1505]. Weight Uncertainty in Neural Networks - [[arxiv](https://arxiv.org/abs/1505.05424)] [[Note](/notes/bbb.md)] - [2015]. On Modern Deep Learning and Variational Inference - [[NIPS](http://www.approximateinference.org/accepted/GalGhahramani2015.pdf)] [[Note](/notes/modern-vi.md)] -- [1995]. Bayesian learning for neural networks +- [1995]. Bayesian learning for neural networks - [[PDF](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf)] ## (2). Probabilistic deep models > Use probabilistic model to imitate deep neural networks. @@ -56,7 +56,7 @@ - [1611]. Categorical Reparameterization with Gumbel-Softmax - [[arxiv](https://arxiv.org/abs/1611.01144)] ## (6) Bayesian neural network pruning -> Sparse prior can be used to induce sparse weight or neuron in neural networks thus favor smaller network structure for mobile devices etc. +> Sparse prior can be used to induce sparse weight or neuron in neural networks, thus favor smaller network structure for mobile devices etc. - [1711]. Interpreting Convolutional Neural Networks Through Compression - [[arXiv](https://arxiv.org/abs/1711.02329)] [[Note](/notes/interpret-cnn-compress.md)] - [1705]. Structural compression of convolutional neural networks based on greedy filter pruning - [[arXiv](https://arxiv.org/abs/1705.07356)] [[Note](/notes/interpret-cnn-compress.md)] @@ -70,3 +70,4 @@ Any contribution is welcome. But notice that we need '*one phrase summary*' to g ## Contributors - [Jun Lu](https://github.com/junlulocky) +- [Christine Chai](https://github.com/star1327p) diff --git a/notes/perturbative-vi.md b/notes/perturbative-vi.md index 06a4e48..e504a7f 100644 --- a/notes/perturbative-vi.md +++ b/notes/perturbative-vi.md @@ -1,5 +1,5 @@ ## [Perturbative Black Box Variational Inference](https://arxiv.org/abs/1709.07433)] -The drawback of KL divergence is that: suppose the *q(w|.)* is the variational distribution and *p(w|.)* is the posterior distribution we want to use. The KL divergence will penalise *q(w)* for placing mass where *p(w|.)* has no or small mass and penalise less for not placing mass where *p(w|.)* has large mass[See another note](/notes/alpha-divergence.md). The authors constructed a new variational bound which is tighter than the KL bound and **more mass covering**. Compared to alpha-divergences, its reparameterization gradients have a lower variance. In short, the authors chose a lower bound lies in the general version of evidence lower bound (ELBO) - f-ELBO, that is a biased estimator with smaller variance which induces careful bias-variance trade-off. +The drawback of KL divergence is that: suppose the *q(w|.)* is the variational distribution and *p(w|.)* is the posterior distribution we want to use. The KL divergence will penalise *q(w)* for placing mass where *p(w|.)* has no or small mass and penalise less for not placing mass where *p(w|.)* has large mass [[See another note](/notes/alpha-divergence.md)]. The authors constructed a new variational bound which is tighter than the KL bound and **more mass covering**. Compared to alpha-divergences, its reparameterization gradients have a lower variance. In short, the authors chose a lower bound lies in the general version of evidence lower bound (ELBO) - f-ELBO, that is a biased estimator with smaller variance which induces careful bias-variance trade-off. -Note: it also contains a good review how ELBO can be derived from the marginal distribution of data. \ No newline at end of file +Note: it also contains a good review how ELBO can be derived from the marginal distribution of data. diff --git a/notes/smooth-svi.md b/notes/smooth-svi.md index fe67c4f..f3af797 100644 --- a/notes/smooth-svi.md +++ b/notes/smooth-svi.md @@ -1,3 +1,3 @@ ## [Smoothed Gradients for Stochastic Variational Inference](http://papers.nips.cc/paper/5557-smoothed-gradients-for-stochastic-variational-inference.pdf) -stochastic variation inference uses a weighted sum to update the parameter which is unbiased. In smoothed gradients for stochastic variation inference, they uses a window averaged to update the parameter which is biased estimator but reduces the variance so as to fasten the convergence. \ No newline at end of file +Stochastic variation inference uses a weighted sum to update the parameter which is unbiased. In smoothed gradients for stochastic variation inference, they use a window averaged to update the parameter which is biased estimator but reduces the variance so as to expedite the convergence. diff --git a/notes/stein-var.md b/notes/stein-var.md index eb4f14e..da52cec 100644 --- a/notes/stein-var.md +++ b/notes/stein-var.md @@ -1,3 +1,3 @@ ## [Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm](https://arxiv.org/abs/1608.04471) -The problem of variational inference is that the variational distribution is usually over-simplified and it maybe very different to the posterior distribution of interest. Stein variational gradient descent favors the stein's identity and thus using a iterative methods to make the 'variational distribution' closer to the posterior distribution of interest. \ No newline at end of file +The problem of variational inference is that the variational distribution is usually over-simplified and it may be very different than the posterior distribution of interest. Stein variational gradient descent favors the Stein's identity and thus using a iterative methods to make the 'variational distribution' closer to the posterior distribution of interest.