Skip to content

How to do hyperparameter tuning #76

@simsurace

Description

@simsurace

I tried to AD aug_elbo in the NegBinomialLikelihood example, i.e. (removed unnecessary bits), purposefully avoiding ParameterHandling.jl and trying only with ForwardDiff.gradient

# # Negative Binomial

# We load all the necessary packages
using AbstractGPs
using ApproximateGPs
using AugmentedGPLikelihoods
using Distributions
using ForwardDiff # <-- try this first
using LinearAlgebra

# We create some random data (sorted for plotting reasons)
N = 100
x = range(-10, 10; length=N)
kernel = with_lengthscale(SqExponentialKernel(), 2.0)
gp = GP(kernel)
lik = NegBinomialLikelihood(15)
lf = LatentGP(gp, lik, 1e-6)
f, y = rand(lf(x));

# ## ELBO
# How can one compute the Augmented ELBO?
# Again AugmentedGPLikelihoods provides helper functions
# to not have to compute everything yourself
function aug_elbo(lik, u_post, x, y)
    qf = marginals(u_post(x))
    qΩ = aux_posterior(lik, y, qf)
    return expected_logtilt(lik, qΩ, y, qf) - aux_kldivergence(lik, qΩ, y) -
           kldivergence(u_post.approx.q, u_post.approx.fz)     # approx.fz is the prior and approx.q is the posterior 
end

function u_posterior(fz, m, S)
    return posterior(SparseVariationalApproximation(Centered(), fz, MvNormal(m, S)))
end

# ## Try to differentiate loss function

function makeloss(x, y)
    N = length(x)
    function loss(θ)
        k = ScaledKernel(
            RBFKernel()  ScaleTransform(inv(θ[1])), 
            θ[2]
        )
        gp = GP(k)
        lik = NegBinomialLikelihood(θ[3])
        fz = gp(x, 1e-8);
        u_post = u_posterior(fz, zeros(N), Matrix{Float64}(I(N)))
        return aug_elbo(lik, u_post, x, y)
    end
end

θ = [1., 1., 15.]

loss = makeloss(x, y)
loss(θ) # works!
ForwardDiff.gradient(loss, θ) # MethodError

There is an easy fix (happy to open a PR): change the definition of aux_posterior as

function aux_posterior(lik::NegBinomialLikelihood, y, f)
    c = sqrt.(second_moment.(f))
    return For(TupleVector(; y=y, c=c)) do φ
        NTDist(PolyaGamma.y + lik.r, φ.c)) # Distributions uses a different parametrization
    end
end
julia> ForwardDiff.gradient(loss, θ)
3-element Vector{Float64}:
  5.790557942012172e7
 -1.9761748845444782e9
 16.184871970106013

BTW: is it expected that the values of the augmented ELBO are so much larger in magnitude than the normal ELBO?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions