How to do hyperparameter tuning

I tried to AD `aug_elbo` in the `NegBinomialLikelihood` example, i.e. (removed unnecessary bits), purposefully avoiding ParameterHandling.jl and trying only with `ForwardDiff.gradient`

```julia
# # Negative Binomial

# We load all the necessary packages
using AbstractGPs
using ApproximateGPs
using AugmentedGPLikelihoods
using Distributions
using ForwardDiff # <-- try this first
using LinearAlgebra

# We create some random data (sorted for plotting reasons)
N = 100
x = range(-10, 10; length=N)
kernel = with_lengthscale(SqExponentialKernel(), 2.0)
gp = GP(kernel)
lik = NegBinomialLikelihood(15)
lf = LatentGP(gp, lik, 1e-6)
f, y = rand(lf(x));

# ## ELBO
# How can one compute the Augmented ELBO?
# Again AugmentedGPLikelihoods provides helper functions
# to not have to compute everything yourself
function aug_elbo(lik, u_post, x, y)
    qf = marginals(u_post(x))
    qΩ = aux_posterior(lik, y, qf)
    return expected_logtilt(lik, qΩ, y, qf) - aux_kldivergence(lik, qΩ, y) -
           kldivergence(u_post.approx.q, u_post.approx.fz)     # approx.fz is the prior and approx.q is the posterior 
end

function u_posterior(fz, m, S)
    return posterior(SparseVariationalApproximation(Centered(), fz, MvNormal(m, S)))
end

# ## Try to differentiate loss function

function makeloss(x, y)
    N = length(x)
    function loss(θ)
        k = ScaledKernel(
            RBFKernel() ∘ ScaleTransform(inv(θ[1])), 
            θ[2]
        )
        gp = GP(k)
        lik = NegBinomialLikelihood(θ[3])
        fz = gp(x, 1e-8);
        u_post = u_posterior(fz, zeros(N), Matrix{Float64}(I(N)))
        return aug_elbo(lik, u_post, x, y)
    end
end

θ = [1., 1., 15.]

loss = makeloss(x, y)
loss(θ) # works!
ForwardDiff.gradient(loss, θ) # MethodError
```

There is an easy fix (happy to open a PR): change the definition of `aux_posterior` as
```julia
function aux_posterior(lik::NegBinomialLikelihood, y, f)
    c = sqrt.(second_moment.(f))
    return For(TupleVector(; y=y, c=c)) do φ
        NTDist(PolyaGamma(φ.y + lik.r, φ.c)) # Distributions uses a different parametrization
    end
end
```

```julia
julia> ForwardDiff.gradient(loss, θ)
3-element Vector{Float64}:
  5.790557942012172e7
 -1.9761748845444782e9
 16.184871970106013
```

BTW: is it expected that the values of the augmented ELBO are so much larger in magnitude than the normal ELBO?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to do hyperparameter tuning #76

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to do hyperparameter tuning #76

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions