Skip to content

An abstraction for the realization of a GPΒ #364

@FelixBenning

Description

@FelixBenning

So in the documentation there is the sentence:

An AbstractGP, f, should be thought of as a distribution over functions. This means that the output of rand(f) would be a real-valued function. It's not usually possible to implement this though, so we don't.

When I first encountered AbstractGPs.jl I thought that the abstractions you define were simply born out of a different opinion. But reading this sentence I think it this might actually be a fruitful discussion. Because I disagree. It is absolutely possible to implement rf = rand(f).

The way you do it is lazily. I mean you could also not implement f(x) = 2x if that meant you had to calculate f for every possible x right from the start. The way you are in fact able to implement f anyway is, by implmementing the process to get from x to f(x) and only apply this process once you actually want to know f(x) for a particular x.

Now in my particular case (gradient descent on a random function), I needed to incrementally evaluate this abstract random function rf = rand(f). If I wanted to do this with AbstractGPs.jl it would look something like this (I am going to try and ignore the fact that AbstractGPs is also not intended for the simulation of gradients and pretend that rfrepresents the gradient vector field for simplicity):

# Gradient Descent with 
x0 = zeros(n)
x = [x0]
f = GP(kernel)
grads = [rand(f(x))]
while true
     append!(x, x[end] + step_size * grads[end])
     append!(grads, rand(posterior(f, grads), x[end:end]))
     if norm(grads[end]) < eps
            break
     end
end

This is anything but natural. Now in comparison consider this:

x = zeros(n)
rf = rand(GP(kernel))
while true
      grad = rf(x)
      x += step_size * grad
      if norm(grad) < eps
          break
     end
end

The above is essentially what I implemented in https://github.com/FelixBenning/IncrementalGRF.jl. Only that I skip the rand(GP(kernel)) and essentially treat GP(kernel) as the realization of a random function immediatly. Well in my case it is called GaussianRandomFunction(kernel) but that is besides the point. It essentially works like this:

Whenever rf(x) is called, the value rf(x) is simulated based on a stored vector of previous evaluations to which x is added at the end. To ensure this is compuationally efficient, I also store the running cholesky decomposition (which can easily be extened by another row). Since the cholesky decomposition does the same incremental extension anyway, calculating it incrementally turns out to be no more expensive complexity wise. The only implementation issue is to allow for an extendible matrix. Since we are interested in a triangular matrix though, we can do that using an extensible vector by simply appending rows resulting in a packed storage format. Unfortunately this format is only supported by BLAS2 routines and not BLAS3, so at the moment I am loosing a little bit of performance (but half the memory requirements). I want to fix this issue with packed block matrices in the future.

But anyway: What do you think about this way to implement random functions? If you wanted to get the conditional expectation instead of new evaluations, you could easily freeze such a random function e.g.

ce = conditionalExpection(rf)
ce(x) # = E[ rf(x) | rf.(previous_points)]

oh yeah and you also get the broadcast operator for free: rf.(evaluatioin_points) totally makes sense if you want to evaluate rf at multiple points. What do you think about this interface? Are there use cases where this interface is more clunky than the current one of AbstractGPs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions