Scalable GPs : one-time pre-computed cache #1632

spartnx · 2021-05-28T22:03:02Z

spartnx
May 28, 2021

Hello GPyTorchers!

I have implemented a script that trains scalable GP models as shown in this webpage.

I am training my model on a single GPU with 200,000 data points. This takes about 20 minutes which is fine since the model is then saved in a '.pth' file for later reuse.

However, after loading the trained parameters to a new instance of the model, it takes a long time for the cache to be computed on a single GPU before being able to do almost instantaneous predictions. In the title of Table 1 of this paper, the authors mention that they compute the one-time pre-computed cache using 8 GPUs, but I only have a single GPU available on my computer and my application of GPs is rather time-critical. To be more specific, I am building several GP surrogates that I want to load before running a script that evaluates those surrogates repeatedly. And I would like this loading process (which includes the cache pre-computation) to be run as fast as possible.

For example right now, for a model built with 200,000 data points, it takes 900s to pre-compute the cache of my single GPU. If I have, say, 5 such models that I want to load, that would mean more than an hour to pre-compute the cache of my GPU. Is there a way to accelerate this pre-computation of the cache on a single GPU?

Two related question:

does this pre-computation of the cache consist in computing the K_XX y matrix product in eq. (3) of the above paper?
could the cache of my single GPU load and store 5 GP models trained with 200,000 datapoints?

Thanks a lot for your insights!

gpleiss · 2021-06-09T17:37:12Z

gpleiss
Jun 9, 2021
Maintainer

Hi @tristansdj - you should try using the KeOps integration instead: https://docs.gpytorch.ai/en/latest/examples/02_Scalable_Exact_GPs/KeOps_GP_Regression.html. It is able to precompute caches and do training on a single GPU, often with much more efficiency than our multi-gpu implementation.

1 reply

gpleiss Jun 9, 2021
Maintainer

Also, to answer your questions:

does this pre-computation of the cache consist in computing the K_XX y matrix product in eq. (3) of the above paper?

Yes - as well as a low rank approximation of K_XX^{-1} needed to compute the posterior variances

could the cache of my single GPU load and store 5 GP models trained with 200,000 datapoints?

Yes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scalable GPs : one-time pre-computed cache #1632

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Scalable GPs : one-time pre-computed cache #1632

Uh oh!

spartnx May 28, 2021

Replies: 1 comment · 1 reply

Uh oh!

gpleiss Jun 9, 2021 Maintainer

Uh oh!

gpleiss Jun 9, 2021 Maintainer

spartnx
May 28, 2021

Replies: 1 comment 1 reply

gpleiss
Jun 9, 2021
Maintainer

gpleiss Jun 9, 2021
Maintainer