InducingPointKernel prediction scales with N^2? #1708

AustinT · 2021-07-22T14:43:45Z

AustinT
Jul 22, 2021

It seems to be that posterior prediction with the inducing point kernel scales with O(N^2) instead of O(N) like you would expect from reading the paper by Titsias et al, 2009. Is this intended behaviour?

Edit: Update: I checked with GPy and it should scale linearly. They had a PR in 2019 that fixed this. Maybe gpytorch is doing the same thing...

Edit 2: I wasn't on the latest version of gpytorch. I updated and it's a bit faster but the scaling is still quadratic (the plot looks the same qualitatively).

Edit 3: I think the issue is here. SGPR prediction strategy overrides covar_cache but not mean_cache, which appears to calculate the full train-train covariance matrix? I could be wrong though...

Code

I added the following cell to the end of SGPR_Regression_CUDA and got the following plot (without GPU on my machine):

import datetime
import numpy as np

N_list = [500, 700, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 12000,]
time_list = []
for N in N_list:
    model.train(); likelihood.train()
    
    # replace training data
    model.set_train_data(train_x[:N], train_y[:N], strict=False)
    
    # Get predictions
    model.eval()
    likelihood.eval()
    start_time = datetime.datetime.now()
    with gpytorch.settings.fast_computations(False, False, False), torch.no_grad():
        preds = model(test_x)
        mae = torch.mean(torch.abs(preds.mean - test_y)).item()
    end_time = datetime.datetime.now()
    time_list.append((end_time-start_time).total_seconds())
    print(f'N={N} time={time_list[-1]:.1f} MAE={mae:.3f}')

# Plotting
poly_coeff = np.polyfit(N_list, time_list, deg=2)
poly_pred = np.polyval(poly_coeff, N_list)
plt.plot(N_list, time_list, ".-", label="times")
plt.plot(N_list, poly_pred, "-", label="quadratic fit")
plt.legend()
plt.xlabel("N")
plt.ylabel("Time (s)")
plt.title("Prediction time vs N train")
plt.show()

Output showing clear quadratic scaling:

Answered by jacobrgardner

Jul 24, 2021

@AustinT @wjmaddox #1709

Was better to fix by modifying LowRankRootAddedDiagLazyTensor actually, just had to override inv_matmul. The fast_computations setting should now properly do nothing when either training or testing with SGPR -- it already did nothing for training.

View full answer

wjmaddox · 2021-07-23T14:29:44Z

wjmaddox
Jul 23, 2021
Collaborator

My guess of what's going on in the predictive mean is that when you turn fast computations off, you end up evaluating and then computing something more like a dense cholesky decomposition of the n x n matrix. What happens if you allow fast computations?

1 reply

AustinT Jul 23, 2021
Author

Good suggestion. If I turn fast computations on then the runtime is linear with N as it should be. However, I still think that gpytorch should support scalable inference with fast computations turned off because for some applications you need exact predictions and CG doesn't reach a tolerable accuracy in a reasonable number of iterations (in my experience).

jacobrgardner · 2021-07-24T01:42:52Z

jacobrgardner
Jul 24, 2021
Maintainer

Yeah it certainly should do things smarter when it calls Cholesky. I suppose we could either override mean cache on the SGPR predictive strategy or do something on the lazy tensor whenever fast computations is off

1 reply

jacobrgardner Jul 24, 2021
Maintainer

Looked into this.

The good news is, with fast computations on, it's already not using CG for SGPR, because inducing_point_kernel returns a LowRankRootLazyTensor, and LowRankRootAddedDiagLazyTensor implements it's own solves and logdet using woodbury / matrix determinant lemma:
https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/lazy/low_rank_root_added_diag_lazy_tensor.py

So, the trivial way to fix this problem is to just not turn fast_computations off, since there's no CG involved with SGPR anyways.

The bad news is that the InvMatmul function will just call .cholesky() on the lazy tensor if fast_computations is off, so it's skipping all of that nice Woodbury math.

The best news is that this is a trivial one line fix in _inv_matmul.py

jacobrgardner · 2021-07-24T13:55:04Z

jacobrgardner
Jul 24, 2021
Maintainer

@AustinT @wjmaddox #1709

Was better to fix by modifying LowRankRootAddedDiagLazyTensor actually, just had to override inv_matmul. The fast_computations setting should now properly do nothing when either training or testing with SGPR -- it already did nothing for training.

1 reply

AustinT Jul 26, 2021
Author

Wow, neat solution! Thanks so much for this 😍

InducingPointKernel prediction scales with N^2? #1708

Uh oh!

Uh oh!

AustinT Jul 22, 2021

Code

Replies: 3 comments · 3 replies

Uh oh!

Uh oh!

wjmaddox Jul 23, 2021 Collaborator

Uh oh!

Uh oh!

AustinT Jul 23, 2021 Author

Uh oh!

jacobrgardner Jul 24, 2021 Maintainer

Uh oh!

jacobrgardner Jul 24, 2021 Maintainer

Uh oh!

Uh oh!

jacobrgardner Jul 24, 2021 Maintainer

Uh oh!

AustinT Jul 26, 2021 Author

AustinT
Jul 22, 2021

Replies: 3 comments 3 replies

wjmaddox
Jul 23, 2021
Collaborator

AustinT Jul 23, 2021
Author

jacobrgardner
Jul 24, 2021
Maintainer

jacobrgardner Jul 24, 2021
Maintainer

jacobrgardner
Jul 24, 2021
Maintainer

AustinT Jul 26, 2021
Author