Skip to content

ELLA #53

@murphyk

Description

@murphyk

Z. Deng, F. Zhou, and J. Zhu, “Accelerated Linearized Laplace Approximation for Bayesian Deep Learning,” in Advances in Neural Information Processing Systems, Oct. 2022 [Online]. Available: https://openreview.net/forum?id=jftNpltMgz. [Accessed: Apr. 29, 2023]
https://github.com/thudzj/ELLA

That is, they do the same thing for FCEKF. Additional elements of the Lofi version:
(1) W acts as pseudo-observations, compressing the observations in J.
(2) Upsilon rescales the kernel, modifying the base NTK.

They use the GP formulation to motivate a low-rank approximation that is complementary to ours, compressing along the model parameters (P) rather than along the training set (NC). In other words, the true posterior precision equals J' J where J is the NCxP matrix that concatenates Jacobians from all training examples (for simplicity I'm absorbing outcome covariance, our R or their Lambda, into J as I've done elsewhere). Lofi approximates J' J by W W' where W is PxL with small L. Their ELLA starts with the GP formulation, which uses J J', and approximates it as Phi' Phi where Phi is KxNC with small K.

Their rank reduction is also based on SVD, applied to the kernel matrix of a random sample of the training data (of size M, with K ≤ M < NC). Because SVD of J J' and J' J are equivalent (i.e., retain the same information), their method should be close (possibly identical, aside from our SSM assumptions) to a hypothetical version of Lofi that was based on M random past observations. That is, imagine we could compute the exact posterior precision based on those M observations and then compute rank-K SVD. This analysis suggests we should outperform them, since, instead of a random subset of the data, we use all the data, updated online and filtered through our SSM.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions