Skip to content

Conversation

@syncrostone
Copy link
Contributor

This PR adds GPFA kernels (tested) and an associated example notebook. I made my best guesses to keep things stylistically in line with the rest of the codebase -- please let me know if there are things I missed / that need to be changed.

I realized that the already implemented LCM kernel was mathematically identical after doing most of the work on this -- because they are used and parametrized differently I think it could still make sense to add GPFA, but I understand if you disagree.

Explanation of GPFA (also in the example notebook)

Gaussian Process Factor Analysis (GPFA), introduced in this paper learns underlying latent trajectories for the outputs while smoothing the data. A more recent paper suggests scalable implementations for GPFA (in the supplement ) as well as extending GPFA with kernels for dynamical systems. The implementation in this notebook is developed with a future GPFA for Dynamical Systems (GPFADS) extension in mind.

GPFA(DS) are useful when you want to simultaneous smooth and reduce the dimensionality of neural data.

Given test_xpoints $t$, $M$ latent variables $x$ with underlying kernels $k_1, ...k_M$, and observations (neural data) $y$ from $N$ neurons, the observations in GPFA are are assumed to arise as follows:

  • The latents variables are independent of eachother:
    $$ k(x_i(t), x_j(t')) = \delta_{ij} k_i(t,t')
    $$

  • We write the above in a vector of latents as:
    $$k[\mathbf{x}(t), \mathbf{x}(t')] = \sum_{i=1}^{M}k_i(t,t')\otimes (\mathbf{e}_i \mathbf{e}_i^T)
    $$

  • We combine the latents into observations as follows:
    $$k[\mathbf{y}(t), \mathbf{y}(t')] = (C) k( \mathbf{x}(t) , \mathbf{x}(t'))(C^T)
    $$

where $k_i$ is a standard kernel (e.g. RBF) that operates on the inputs.
$\delta_{ij}$ is the Kronecker delta, $\mathbf{e}_i$ is a unit vector of length $M$ with nonzero element at index $i$, $\otimes$ is the Kronecker product, and $C\in NxM$ is the mixing matrix, indicating how latents are combined into observations.

GPFA is mathematically identical to LMC/LCM which arose from the geostatistics literature. Both LCM/LMC and GPFA construct a multi-output kernel as the covariance of a linear combination of multiple latent Gaussian processes. However, in GPFA, we are interested in recovering the posterior over these latents processes (not possible with GPyTorch's LMC model), hence the extension of LMC proposed here. Due to the gridded structure of neural data, and the relatively sparse sampling of data per lengthscale, GPFA does not need to rely on inducing points and can instead be modeled using exact / iterative GPs.

Things I would like to add for a future PR

  • make it work on batches
  • Add a Kiss-GP GPFA demo demonstrating scaling on GPU / on gridded input since this was the motivation for using gpytorch for this. This would also use fast solves, etc.
  • Add a better latent posterior test, likely using the leave-neuron-out prediction error method for GPFA described in this paper
  • Add prior / constraints on C if this will make regression Bayesian over the C parameters as well.

Questions for GPytorch Team about Design

  • Estimating the latent posterior is important for the community that uses GPFA. RIght now this function is defined as part of the model, but models are typically built by individual users. I would like this latent_posterior function to be available without just copy pasting it from the example. Is there a good place it could live in the codebase (e.g. a GPFAModel class or utils)? Would it make sense to add a GPFAModel to the codebase that includes it? Ideally this function should be usable across both exact and approximate GPFA so I am not sure what the best solution is.

Questions for GPytorch Team that came up while trying to scale this using KISS-GP on gridded input

  • I don't quite understand how batches work in gpytorch. So far I am following the batch multitask test as an example. I haven't quite gotten it working but haven't thoroughly debugged it yet. Do you have any pointers for getting this working on batches?
  • I started this before some of the new Kronecker updates were added. I've tried to incorporate the new Kronecker stuff, but is there anything I can do to make things more efficient? Is there anywhere I should be using BlockDiagonal instead of I kronecker C?
  • When scaling up to large input lengths, the preconditioner takes too much memory, because while the kernel / KISS-GP are linear in memory, the preconditioner is still cubic. Right now I use preconditioner(0) to get around this. Are there other lower memory preconditioners available? I have some ideas of potential low-memory preconditioners I could use -- is there a good way to add a custom preconditioner?
  • Right now when I try to calculate the confidence_interval at scale, it overflows the memory. Does GPytorch build in any way of estimating the diagonal of the matrix stochastically? If there isn't a way to do this in settings, is this something that would be useful to add beyond the scalable GPFA case?
  • Is the evaluate at end of my latent_posterior function in the GPFAModel (in the notebook / test) going to slow things down at scale? If so, is there a way to avoid evaluating there?

Copy link
Member

@gpleiss gpleiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@syncrostone - this is awesome, thanks for the PR! Two questions:

  1. Can you explain how this differs from the LCMKernel (already implemented)?

  2. Do you think the primary use case will be having the same kernel type for each of the latent factors (e.g. all factors use RBF kernels, but with different lengthscales)? If so, I think we can make a more efficient version using batched kernels.

@syncrostone
Copy link
Contributor Author

syncrostone commented May 5, 2021

Thanks for the quick reply @gpleiss!

@syncrostone - this is awesome, thanks for the PR! Two questions:

1. Can you explain how this differs from the LCMKernel (already implemented)?

Mathematically, the kernel is identical to LCM with rank=1 if v = 0 in the index kernel. I have not tried fixing v=0, but imagine this would be possible by setting var_constraint equal to an IntervalConstraint(0,0)?

Perhaps the more important contribution is actually the latent_posterior function that is part of the model class in the notebook, maybe this would be better made as the addition of a new model class with this function?

Everything in that function should probably be possible to do with LCM, but would be less efficient and require more building out of GPFA parameters from the LCM in terms of both the C matrix and the latent covariances.

2. Do you think the primary use case will be having the same kernel type for each of the latent factors (e.g. all factors use RBF kernels, but with different lengthscales)? If so, I think we can make a more efficient version using batched kernels.

I would love to hear how to make it more efficient using batched kernels.
How much of a speedup would this give? The main use case is having the same kernel type for each of the latent factors, but I could see potential future use cases for mixing kernel types.

@gpleiss
Copy link
Member

gpleiss commented Jun 10, 2021

Sorry for the slow reply @syncrostone ! I got bogged down with NeurIPS - I'll take a look at this on Monday.

@gpleiss
Copy link
Member

gpleiss commented Jun 23, 2021

@syncrostone - sorry again for the delay! Looking at this now!

@@ -0,0 +1,75 @@
#!/usr/bin/env python3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this kernel seems really similar to the LCM Kernel that we already have in GPyTorch. Is there any way we can integrate the two?

I know that you want to be able to do posterior inference on the latent function (currently not totally possible with GPyTorch), and I'm not sure if this is possible with the current LCMKernel implementation. However, I'm afraid that having two similar kernels in the library will be difficult and confusing to maintain.

@gpleiss
Copy link
Member

gpleiss commented Jun 23, 2021

@syncrostone - I agree that the latent_posterior function in the example would be nice to integrate somewhere deeper in the library. At the moment, I'm not sure exactly where that would be.

In general, I think we need to do some refactoring of our multitask code. It might require a bigger change to the library to get these sorts of inference working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants