-
Notifications
You must be signed in to change notification settings - Fork 575
Description
tl;dr: We should implement a HeteroskedasticLikelihood
instead of using a WhitenoiseKernel
.
From offline discussion (cc @bkarrer, @gpleiss):
Assume for the moment that a heteroskedastic noise function var(x) is known for all x. The statistical model we're interested in is y(x) = f(x) + eps(x) where f~GP(mu(x), k(x, x')) and eps ~ N(0, var(x)1(x=x')). The noise eps(x) is redrawn for every independent observation taken at x. The training data are (x, y) pairs, with possibly repeat observations at x, and I would like to get a posterior over both f and y from gpytorch, where another observation at x will return a new value y.
It sounds like you are saying that to implement this we should not use the WhiteNoiseKernel but instead define a heteroskedastic likelihood that uses this var(x) function?
A HeteroskedasticLikelihood might take a little thought.
The easy part:HeteroskedasticLikelihood
should be easy to write. The constructor could take in a handle to a functionlambda(x)
that returns the variance as a function ofx
. It could be a simple closed-form function, or it could be something more complicated/parametric like a neural network.HeteroskedasticLikelihood
would take inf
(a MVN distribution, with meanmu
and covarSigma
) andx
, and return an MVN with meanmu
and covar (Sigma + \lambda x I
). This will be super easy to implement in GPyTorch.
The hard part: right now, all of our likelihoods only expect to takef
as input, notx
. This wouldn’t be a huge issue for variational inference, since the likelihoods can take whatever form they need. But exact inference expects the likelihood to be aGaussianLikelihood
, which takes in onlyf
as input.
A solution: makeHeteroskedasticGaussianLikelihood
the super class ofGaussianLikelihood
.
GaussianLikelihood
would still only require one argument (f
).GaussianLikelihood
probably should accept two arguments (f
andx
), but will basically do nothing withx
. This way exact GP models wouldn’t have to differentiate between the two likelihoods.
Yeah this should be implemented as a likelihood I think. the
Model
class in GPyTorch is a thing that returns eitherp(f)
orp(f|D)
depending on whether we are in train mode or eval mode with or without training data supplied. TheLikelihood
takesp(f|D)
orp(f)
and turns it in top(y|D)
orp(y)
. Since you want the noise to be implemented not at the level of the kernel (which is a property of the GP prior and therefore a property overf
), I thinkHeteroskedasticLikelihood
makes sense
An alternative would be to allow Likelihoods to take
**kwargs
, which I think they already do? I see -- we actually call the likelihood inExactGP.__call__
don't we?
The predictive functions call the likelihood now. As of the
MultitaskLikelihood
implementation,exact_predictive*
takes in the likelihood and not just the noise
Right they could alternatively take the noise, but at some point we'll be calling
likelihood()
during theExactGP.__call__
method. Probably, we should make it the case that all likelihoods are a function off
andx
. And, if a likelihood does not depend onx
, then it wouldn't require anx
argument (but would accept it).
I think that there would maybe be a way to implement the channel likelihood as a function of
f
andx
? Basically,x
would be a tuple, containingx
and the grouping information.
So what do you think of making
GaussianLikelihood.forward
bedef forward(input, *data)
And in the case of the cluster MTGP model (which prompted the kwargs for variational likelihoods), I think we could do something similar. Yeah I think that’s the correct thing to do.
and then we pass in the starred tuple of inputs you normally pass to ExactGP.forward
Yeah that’s a good idea. Basically, I think there might be a way to make it so that, even in the variational case, we could get away with a similar
def forward(self, input, *data)
pattern