Skip to content

How should the OILMM be parameterised when using AD? #50

@wil-j-wil

Description

@wil-j-wil

Hi,

How should I parameterise an OILMM if I want to optimise the hyperparameters whilst ensuring that the columns of U remain orthogonal?

I have the following setup which uses the orthogonal constraint from ParameterHandling:

using AbstractGPs
using KernelFunctions
using LinearAlgebra
using LinearMixingModels
using ParameterHandling

num_outputs = 11
num_latents = 3

x_train = KernelFunctions.MOInputIsotopicByOutputs(collect(1:100), num_outputs)
y_train = rand(100 * num_outputs)

H_init = rand(num_outputs, num_outputs)
U_, S_, V_ = svd(H_init)
U_init = U_[:, 1:num_latents]
S_init = S_[1:num_latents]

θ_oilmm = (;
    U = orthogonal(U_init),
    S = positive.(S_init),
)

function build_gp(θ)
    sogp = GP(Matern52Kernel())
    latent_gp = independent_mogp([sogp for _ in 1:num_latents])
    return ILMM(latent_gp, Orthogonal.U, Diagonal.S)))
end

function objective(θ)
    oilmm = build_gp(θ)
    return -logpdf(oilmm(x_train, 0.1), y_train)
end

but when I try to compute the gradient of the objective with this parameterisation,

using Zygote

flat_θ_oilmm, unflatten = flatten(θ_oilmm)
unpack = ParameterHandling.value  unflatten

Zygote.gradient(objective  unpack, flat_θ_oilmm)

the gradients of U are NaN (due to the orthogonal constraint).

What's the best way to set this up?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions