junlulocky · star1327p · Aug 6, 2022 · Aug 6, 2022 · Aug 6, 2022 · Aug 6, 2022
diff --git a/README.md b/README.md
@@ -3,15 +3,15 @@
 > We here organize these papers in the following categories. But some of them might have overlap.
 
 ## (1). Uncertainty in deep learning
-> Model uncertainty in deep learning via Bayesian modelling by variatial inference etc.
+> Model uncertainty in deep learning via Bayesian modelling by variational inference etc.
 
 - [1705]. Concrete Dropout - [[arxiv](https://arxiv.org/abs/1705.07832)] [[Note](/notes/concrete-dropout.md)]
 - [1703]. Dropout Inference in Bayesian Neural Networks with Alpha-divergences - [[arxiv](https://arxiv.org/abs/1703.02914)] [[Note](/notes/alpha-divergence.md)]
 - [1703]. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? - [[arxiv](https://arxiv.org/abs/1703.04977)] [[Note](/notes/)]
 - [2016]. Uncertainty in Deep Learning - [[PDF](https://pdfs.semanticscholar.org/a6af/62389c6655770c624e2fa3f3ad6dc26bf77e.pdf)] [[Blog](http://mlg.eng.cam.ac.uk/yarin/blog_2248.html)] [[Note](/notes/uncertainty-deep-learning.md)]
 - [1505]. Weight Uncertainty in Neural Networks - [[arxiv](https://arxiv.org/abs/1505.05424)] [[Note](/notes/bbb.md)]
 - [2015]. On Modern Deep Learning and Variational Inference - [[NIPS](http://www.approximateinference.org/accepted/GalGhahramani2015.pdf)] [[Note](/notes/modern-vi.md)]
-- [1995]. Bayesian learning for neural networks
+- [1995]. Bayesian learning for neural networks - [[PDF](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf)]
 
 ## (2). Probabilistic deep models
 > Use probabilistic model to imitate deep neural networks.
@@ -56,7 +56,7 @@
 - [1611]. Categorical Reparameterization with Gumbel-Softmax - [[arxiv](https://arxiv.org/abs/1611.01144)]
 
 ## (6) Bayesian neural network pruning
-> Sparse prior can be used to induce sparse weight or neuron in neural networks thus favor smaller network structure for mobile devices etc. 
+> Sparse prior can be used to induce sparse weight or neuron in neural networks, thus favor smaller network structure for mobile devices etc. 
 
 - [1711]. Interpreting Convolutional Neural Networks Through Compression - [[arXiv](https://arxiv.org/abs/1711.02329)] [[Note](/notes/interpret-cnn-compress.md)]
 - [1705]. Structural compression of convolutional neural networks based on greedy filter pruning - [[arXiv](https://arxiv.org/abs/1705.07356)] [[Note](/notes/interpret-cnn-compress.md)]
@@ -70,3 +70,4 @@ Any contribution is welcome. But notice that we need '*one phrase summary*' to g
 
 ## Contributors
 - [Jun Lu](https://github.com/junlulocky)
+- [Christine Chai](https://github.com/star1327p)
diff --git a/notes/perturbative-vi.md b/notes/perturbative-vi.md
@@ -1,5 +1,5 @@
 ## [Perturbative Black Box Variational Inference](https://arxiv.org/abs/1709.07433)] 
 
-The drawback of KL divergence is that: suppose the *q(w|.)* is the variational distribution and *p(w|.)* is the posterior distribution we want to use. The KL divergence will penalise *q(w)* for placing mass where *p(w|.)* has no or small mass and penalise less for not placing mass where *p(w|.)* has large mass[See another note](/notes/alpha-divergence.md). The authors constructed a new variational bound which is tighter than the KL bound and **more mass covering**. Compared to alpha-divergences, its reparameterization gradients have a lower variance. In short, the authors chose a lower bound lies in the general version of evidence lower bound (ELBO) - f-ELBO, that is a biased estimator with smaller variance which induces careful bias-variance trade-off.
+The drawback of KL divergence is that: suppose the *q(w|.)* is the variational distribution and *p(w|.)* is the posterior distribution we want to use. The KL divergence will penalise *q(w)* for placing mass where *p(w|.)* has no or small mass and penalise less for not placing mass where *p(w|.)* has large mass [[See another note](/notes/alpha-divergence.md)]. The authors constructed a new variational bound which is tighter than the KL bound and **more mass covering**. Compared to alpha-divergences, its reparameterization gradients have a lower variance. In short, the authors chose a lower bound lies in the general version of evidence lower bound (ELBO) - f-ELBO, that is a biased estimator with smaller variance which induces careful bias-variance trade-off.
 
-Note: it also contains a good review how ELBO can be derived from the marginal distribution of data.
+Note: it also contains a good review how ELBO can be derived from the marginal distribution of data.
diff --git a/notes/smooth-svi.md b/notes/smooth-svi.md
@@ -1,3 +1,3 @@
 ## [Smoothed Gradients for Stochastic Variational Inference](http://papers.nips.cc/paper/5557-smoothed-gradients-for-stochastic-variational-inference.pdf)
 
-stochastic variation inference uses a weighted sum to update the parameter which is unbiased. In smoothed gradients for stochastic variation inference, they uses a window averaged to update the parameter which is biased estimator but reduces the variance so as to fasten the convergence.
+Stochastic variation inference uses a weighted sum to update the parameter which is unbiased. In smoothed gradients for stochastic variation inference, they use a window averaged to update the parameter which is biased estimator but reduces the variance so as to expedite the convergence.
diff --git a/notes/stein-var.md b/notes/stein-var.md
@@ -1,3 +1,3 @@
 ## [Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm](https://arxiv.org/abs/1608.04471)
 
-The problem of variational inference is that the variational distribution is usually over-simplified and it maybe very different to the posterior distribution of interest. Stein variational gradient descent favors the stein's identity and thus using a iterative methods to make the 'variational distribution' closer to the posterior distribution of interest.
+The problem of variational inference is that the variational distribution is usually over-simplified and it may be very different than the posterior distribution of interest. Stein variational gradient descent favors the Stein's identity and thus using a iterative methods to make the 'variational distribution' closer to the posterior distribution of interest.