-
Notifications
You must be signed in to change notification settings - Fork 24
Description
So in the documentation there is the sentence:
An AbstractGP, f, should be thought of as a distribution over functions. This means that the output of rand(f) would be a real-valued function. It's not usually possible to implement this though, so we don't.
When I first encountered AbstractGPs.jl I thought that the abstractions you define were simply born out of a different opinion. But reading this sentence I think it this might actually be a fruitful discussion. Because I disagree. It is absolutely possible to implement rf = rand(f).
The way you do it is lazily. I mean you could also not implement f(x) = 2x if that meant you had to calculate f for every possible x right from the start. The way you are in fact able to implement f anyway is, by implmementing the process to get from x to f(x) and only apply this process once you actually want to know f(x) for a particular x.
Now in my particular case (gradient descent on a random function), I needed to incrementally evaluate this abstract random function rf = rand(f). If I wanted to do this with AbstractGPs.jl it would look something like this (I am going to try and ignore the fact that AbstractGPs is also not intended for the simulation of gradients and pretend that rfrepresents the gradient vector field for simplicity):
# Gradient Descent with
x0 = zeros(n)
x = [x0]
f = GP(kernel)
grads = [rand(f(x))]
while true
append!(x, x[end] + step_size * grads[end])
append!(grads, rand(posterior(f, grads), x[end:end]))
if norm(grads[end]) < eps
break
end
endThis is anything but natural. Now in comparison consider this:
x = zeros(n)
rf = rand(GP(kernel))
while true
grad = rf(x)
x += step_size * grad
if norm(grad) < eps
break
end
endThe above is essentially what I implemented in https://github.com/FelixBenning/IncrementalGRF.jl. Only that I skip the rand(GP(kernel)) and essentially treat GP(kernel) as the realization of a random function immediatly. Well in my case it is called GaussianRandomFunction(kernel) but that is besides the point. It essentially works like this:
Whenever rf(x) is called, the value rf(x) is simulated based on a stored vector of previous evaluations to which x is added at the end. To ensure this is compuationally efficient, I also store the running cholesky decomposition (which can easily be extened by another row). Since the cholesky decomposition does the same incremental extension anyway, calculating it incrementally turns out to be no more expensive complexity wise. The only implementation issue is to allow for an extendible matrix. Since we are interested in a triangular matrix though, we can do that using an extensible vector by simply appending rows resulting in a packed storage format. Unfortunately this format is only supported by BLAS2 routines and not BLAS3, so at the moment I am loosing a little bit of performance (but half the memory requirements). I want to fix this issue with packed block matrices in the future.
But anyway: What do you think about this way to implement random functions? If you wanted to get the conditional expectation instead of new evaluations, you could easily freeze such a random function e.g.
ce = conditionalExpection(rf)
ce(x) # = E[ rf(x) | rf.(previous_points)]oh yeah and you also get the broadcast operator for free: rf.(evaluatioin_points) totally makes sense if you want to evaluate rf at multiple points. What do you think about this interface? Are there use cases where this interface is more clunky than the current one of AbstractGPs?